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1.  Introduction 


Adaptive  computing  systems  (ACS)  rely  on  reconfigurable  hardware  to  adapt  the  system  opera¬ 
tion  to  changes  in  the  external  environment,  and  to  extend  mission  capability  by  implementing 
new  functions  on  the  same  hardware  platform.  This  results  in  increased  functional  density  and 
reduced  power  consumption  -  features  very  important  in  many  domains,  such  as  space  missions 
or  mobile  devices.  Field-programmable  gate  arrays  (FPGAs)  featuring  incremental  dynamic  run¬ 
time  reconfiguration  (RTR)  offer  additional  benefits  by  allowing  the  system  to  continue  to  execute 
uninterrupted,  while  portions  of  the  FPGA  are  reconfigured  for  new  logic  functions.  ACS  are 
often  deployed  in  harsh  and/or  hostile  remote  environments,  and  they  are  subject  to  strict  high- 
reliability  and  high-availability  requirements.  Cosmic  radiation  may  perturb  the  operation  of  a 
defect-free  FPGA,  and  marginal  defects  not  causing  failures  in  manufacturing  testing  (such  as  a 
short  initially  having  very  high  resistance)  may  become  active  with  the  aging  of  the  device,  or 
because  of  environmental  factors.  Since  direct  human  intervention  for  maintenance  and  repair  is 
impossible  in  such  environments,  fault-tolerant  (FT)  techniques  resulting  in  graceful  degradation 
must  be  used  to  achieve  the  desired  mission  life  even  in  the  presence  of  faults.  However,  tradi¬ 
tional  FT  design,  based  on  replicated  modular  redundancy  and  voting,  is  extremely  expensive 
given  the  space,  weight,  and  power  constraints  of  ACS. 

While  FPGA  testing  has  been  the  focus  of  many  research  and  development  efforts  [2,  18,  20,  21, 
25,  26,  27,  28,  33,  34,  41,  44, 45,  46,  52,  53,  55,  56,  57,  60],  most  of  this  previous  work  deals  with 
off-line  testing,  which  is  not  applicable  in  missions  relying  on  continuous  system  operation.  The 
only  attempt  at  on-line  FPGA  testing  has  been  the  pioneering  work  of  Shnidman  et  al.  [48].  The 
main  idea,  originally  proposed  by  Shombert  and  Siewiorek  in  the  context  of  testing  systolic  arrays 
[49],  is  to  have  only  a  relatively  small  portion  of  the  chip  off-line  and  being  tested,  while  the  rest 
of  the  device  is  on-line  and  continues  its  normal  operation.  Testing  of  the  entire  FPGA  is  accom¬ 
plished  by  repeatedly  moving  different  sections  of  the  system  logic  in  the  most  recently  tested 
part,  thus  allowing  a  new  area  of  the  chip  to  be  tested.  Figure  1  illustrates  this  process,  called  rov¬ 
ing  spares  in  [49]  and  fault  scanning  in  [48].  For  FPGAs,  roving  relies  on  RTR,  which  is  the  abil¬ 
ity  to  dynamically  reconfigure  part  of  an  FPGA  without  disturbing  the  operation  performed  in  the 
rest  of  the  device.  Compared  with  on-line  testing  based  on  modular  redundancy,  this  approach  has 
much  smaller  overhead.  Since  a  fault  occurring  in  the  working  area  will  be  detected  the  next  time 
when  the  fault  site  will  be  under  test,  the  fault  latency  (the  interval  between  the  occurrence  of  a 
fault  and  its  detection)  is  bounded  by  the  interval  required  to  test  the  entire  FPGA. 

Despite  these  advantages,  the  method  proposed  in  [48]  has  several  serious  limitations.  Although 
today  most  FPGAs  use  dedicated  wire  segments  for  inter-block  communication,  this  method  can 
work  only  for  bus-based  FPGAs.  Even  with  this  limitation,  it  is  not  applicable  to  any  existing 


Figure  1.  Roving  Area  Under  Test  Across  the  Chip 
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FPGA  architecture,  sinee  it  requires  a  modified  design  of  the  FPGA  blocks.  Moreover,  these  mod¬ 
ifications  introduce  a  substantial  overhead,  since  the  entire  configuration  memory  is  duplicated. 
This  method  does  not  address  faults  in  the  programmable  interconneet  network,  whieh  occupies 
most  of  the  area  of  an  FPGA,  and  it  is  also  limited  by  the  simplifying  assumption  that  the  config¬ 
uration  memory  and  flip-flops  (FFs)  are  fault-free. 

Most  FT  approaehes  for  FPGAs  target  yield  improvement  by  replaeing  faults  deteeted  in  manu- 
faeturing  test  with  spare  resources  to  achieve  a  fully  functional  chip  [7,  9,  10,  1 1,  12,  19,  22,  23, 
24,  29,  3 1,  36, 40, 42, 47,  58].  But  most  of  these  methods  are  not  applicable  for  on-line  fault  toler- 
anee,  because  they  ean  deal  with  only  a  limited  number  of  faults,  or  only  with  speeifie  faults,  or 
use  repair  teehniques  not  available  in  remote  environments,  or  require  exeessive  redundaney,  or 
need  excessive  computation  time  to  determine  how  to  bypass  faults.  The  method  of  Lach  et  al. 
[31]  divides  the  FPGA  into  tiles,  reserves  a  spare  eell  in  every  tile,  and  preeomputes  replaeement 
eonfigurations  for  every  possible  faulty  eell  in  a  tile.  This  removes  the  need  for  expensive  on-line 
computations  and  enables  fast  bypassing  of  detected  faults.  However,  this  method  cannot  deal 
with  more  than  one  fault  per  tile.  In  [32],  Lach  et  al.  extend  their  method  to  eonsider  intereonnect 
faults.  Similar  to  their  method  for  handling  logie  faults,  several  replacement  configurations  are 
precomputed  for  programmable  signals  that  do  not  extend  beyond  the  tile,  so  that  each  configura¬ 
tion  leaves  some  routing  resources  unused.  Then  when  a  fault  oeeurs,  they  try  to  find  a  eonfigura- 
tion  that  does  not  use  the  faulty  resource;  this  method,  however,  is  not  able  to  tolerate  all  possible 
faults  or  a  group  of  faults  in  the  same  tile.  For  signals  that  do  extend  between  tiles,  they  propose 
two  alternatives.  In  some  eases  the  tile  size  can  be  enlarged  to  encompass  the  signals.  This  works 
well  for  hierarehieal  architectures  like  complex  programmable  logic  devices  (CPLDs)  and  for 
dedicated,  linked  interconnect  paths  like  fast  carries.  For  signals  that  are  routed  through  intercon- 
neets  that  extend  across  several  tiles,  they  leave  a  long  segment  vaeant,  to  be  used  to  replaee  any 
parallel  segment  in  the  same  area.  But  after  one  faulty  segment  is  replaced  by  the  long  spare,  no 
additional  faults  can  be  tolerated  in  that  region. 

The  embryonics  approaeh,  developed  by  Mange  et  al.  [37,  38],  applies  evolutionary  eoneepts 
from  biology  to  achieve  self-repair  in  a  new  type  of  fine-grained  reconfigurable  array.  While  their 
results  are  fascinating,  this  approach  is  not  applicable  to  existing  FPGAs. 

All  the  FT  methods  diseussed  above  are  statie,  as  the  spare  resourees  they  provide  -  logie  cells 
and/or  interconneet  -  are  preassigned  in  the  FPGA.  Too  many  spares  cause  a  large  overhead, 
while  too  few  spares  may  result  in  not  having  sufficient  resources  available  in  areas  affected  by 
fault  clusters.  In  contrast,  the  dynamie  method  proposed  by  Mahapatra  and  Dutt  [36]  alloeates 
interconneet  resources  to  bypass  faults  only  after  faults  have  been  located.  If  the  new  required 
segments  conflict  with  the  current  usage  of  the  routing  tracks,  the  layout  is  incrementally  modified 
to  make  room  for  the  new  segments.  However,  this  method  is  dynamie  only  with  respeet  to  the 
interconneet  resources;  spare  cells  are  still  statieally  alloeated,  which  may  result  in  long  chains  of 
layout  changes  between  the  fault  sites  and  spare  cells. 

Our  approaeh  is  specifieally  targeted  toward  faults  that  appear  during  the  lifetime  of  the  ehip  and 
affeet  the  logie  in  a  STAR  when  that  area  is  under  test.  However,  transient  faults  may  aflfeet  the 
working  area  at  any  time.  For  transient  faults,  our  system  will  use  a  low-overhead  method  such  as 
concurrent  error  deteetion  (CED)  [62].  CED  and  roving  STARs  are  eomplementary  teehniques,  as 
CED  does  not  address  dormant  faults  and  does  not  provide  for  diagnosis.  At  the  system  level,  we 
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use  a  checkpointing  and  rollback  strategy.  A  rollback  restores  the  state  of  the  system  to  the  last 
state  saved  before  the  fault  occurred  (checkpoint),  and  it  should  be  initiated  whenever  CED 
detects  a  fault.  The  STARs  can  continue  roving  during  the  rollback  period,  which  is  viewed  as 
normal  system  activity.  (Faster  fault  location  can  be  achieved  if  the  CED  mechanism  can  indicate 
the  unit  likely  to  be  faulty  and  roving  starts  within  that  unit.)  However,  when  a  fault  is  detected  by 
a  STAR,  rollback  is  not  needed  because  either  it  is  already  in  progress  (if  CED  has  already 
encountered  the  same  fault),  or  the  fault  has  not  affected  the  recent  system  operation  and  hence  it 
must  be  a  dormant  one.  If  CED  has  detected  a  fault,  but  the  next  STAR  roving  round  does  not  con¬ 
firm  its  existence,  the  fault  must  have  been  a  transient  one  from  which  the  system  has  recovered 
by  rollback. 
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2.  Main  Contributions 


We  have  developed  the  first  integrated  approach  to  on-line  testing,  diagnosis,  and  fault  tolerance 
for  FPGAs,  where  all  three  subjects  are  concurrently  treated  in  an  unified  and  consistent  manner. 
Our  approach  inherits  the  basic  feature  of  [49]  and  [48];  namely,  we  have  self-testing  areas 
(STARs)  of  the  FPGA  that  are  roving  around  the  chip.  However,  our  roving  STARs  approach  pro¬ 
vides  many  significant  features  not  available  in  prior  work.  Our  approach  is  applicable  to  any 
FPGA  that  supports  RTR,  and  it  does  not  require  any  modifications  to  the  FPGA  architecture. 

To  satisfy  the  high-reliability  and  high-availability  requirements  of  ACS  hardware,  our  approach 
guarantees  complete  testing  of  both  logic  and  interconnect,  and  does  not  require  any  part  of  the 
chip  to  be  fault  free.  Complete  testing  means  that  all  the  FPGA  resources,  including  logic  and 
interconnect  currently  not  employed  by  the  application  logic,  will  be  tested  in  every  possible 
mode  of  operation.  This  strategy  could  be  considered  an  overkill:  why  don’t  we  test  only  the  used 
resources,  and  only  in  their  current  mode  of  operation?  The  main  reason  is  that  during  the  normal 
operation  of  an  adaptive  system,  the  FPGA  will  be  used  with  different  configurations  at  different 
times,  and  our  strategy  makes  certain  that  no  segment  of  the  FPGA  will  be  assigned  a  function 
that  may  be  incorrectly  performed  because  of  its  dormant  (not  yet  detected)  faults.  Also,  the 
resources  in  the  currently  unused  portions  of  the  FPGA  are  spares  to  be  used  for  fault  tolerance. 
But  since  failures  are  as  likely  to  affect  the  spares  as  the  working  area,  there  is  no  reason  to 
assume  that  the  spares  will  be  fault  free;  thus  the  spares  may  also  develop  dormant  faults,  unless 
they  are  tested  in  the  same  way  as  the  working  area.  Recently,  it  has  been  shown  that  on-line  test¬ 
ing  of  dormant  faults  is  essential  in  achieving  high  reliability  in  safety-critical  applications  [52]. 

Our  approach  is  a  type  of  built-in  self-test  (BIST),  where  both  the  test-pattern  generation  and  the 
analysis  of  the  output  responses  are  done  within  the  circuit  under  test.  This  is  much  more  difficult 
than  external  testing,  since  we  cannot  assume  that  the  FPGA  resources  used  in  test-pattern  gener¬ 
ators  (TPGs)  or  in  output  response  analyzers  (ORAs)  are  fault  free.  Nevertheless,  our  logic  BIST 
techniques  will  detect  any  single  or  multiple  fault  in  a  cell,  and  any  possible  combination  of  mul¬ 
tiple  faulty  cells;  our  interconnect  BIST  techniques  will  detect  any  single  or  multiple  fault  affect¬ 
ing  the  interconnect  network.  By  exhaustively  testing  both  the  programmable  logic  blocks  (PLBs) 
and  the  programmable  interconnect,  our  tests  implicitly  detect  all  possible  stuck  faults  affecting 
the  configuration  memory. 

We  have  developed  diagnostic  techniques  that  achieve  the  highest  possible  resolution,  locating 
any  group  of  faulty  PLBs,  any  group  of  interconnect  faults,  and  also  identifying  defective  mod¬ 
ules  inside  the  faulty  cells.  (Note  that  one  cannot  distinguish  between  a  configuration  bit  being 
stuck  and  the  corresponding  fault  in  the  resource  controlled  by  the  stuck  bit.)  The  diagnostic  goal 
of  conventional  FT  techniques  for  FPGAs  is  to  identify  faulty  resources,  which  are  then  bypassed 
and  replaced  by  spares.  The  high  accuracy  of  our  diagnosis  allows  us  to  introduce  a  new  form  of 
fault  tolerance,  based  on  reusing  faulty  resources  whenever  possible.  A  partially  usable  block 
(PUB)  is  a  faulty  PLB  with  identified  failing  mode(s)  of  operation.  A  PUB  may  be  used  as  a  fault- 
free  cell  in  the  working  part  of  the  system,  provided  that  its  faults  do  not  affect  its  intended  opera¬ 
tion.  Similarly,  we  allow  the  system  logic  to  safely  reuse  faulty  wire  segments.  Reusing  defective 
hardware  resources  increases  the  effective  spare  capacity  and  leads  to  more  graceful  degradation 
and  longer  mission  life. 
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In  addition,  our  roving  STARs  approach  has  important  advantages  in  FT  applications.  In  most 
previous  FT  work,  faults  are  detected  in  the  working  logic,  and  they  must  be  located  and  bypassed 
very  quiekly  to  restart  the  normal  operation  as  soon  as  possible.  In  contrast,  in  our  approach  the 
detected  faults  do  not  affeet  the  working  logic,  so  the  system  operation  does  not  have  to  be  inter¬ 
rupted  for  fault  diagnosis  and  for  computing  fault-bypassing  reconfigurations. 

We  also  introduce  the  idea  of  an  adaptive  system  eloek.  One  side-effeet  of  FT  reeonfigurations  is 
a  possible  inerease  of  the  delays  of  some  critical  paths  of  the  cireuit.  The  eonventional  solution  to 
this  problem  is  to  set  the  clock  period  with  sufficient  slack  to  work  for  any  possible  FT  reconfigu¬ 
ration.  But  this  cost  in  performance  degradation  is  paid  all  the  time,  even  if  the  worst-ease  faults 
will  never  oecur.  We  use  a  programmable  elock  whose  initial  frequeney  is  set  to  the  maximum 
value  allowed  in  the  fault-free  circuit,  and  we  adjust  it  only  when  required  as  a  result  of  FT  recon¬ 
figurations  [14].  In  this  way  the  ACS  performance  will  be  gracefully  degraded,  and  only  when 
needed  in  response  to  the  oecurrenee  of  new  faults. 

We  have  sucessfully  implemented  this  approach  using  a  commercially  available  FPGA  -  specifi¬ 
cally,  the  ORCA  2C  and  2CA  series  FPGA  [16].  However,  we  emphasize  that  our  approach  is 
general,  as  it  may  be  applied  to  any  FPGA  that  uses  segmented  intereonneet  and  features  RTR. 
The  remainder  of  this  report  is  organized  as  follows.  Section  3  presents  the  structure  and  the  oper¬ 
ation  of  the  roving  STARs.  Section  4  describes  the  BIST  and  BIST-based  diagnosis  of  logie  and 
intereonneet  resources  in  the  FPGA.  Seetion  5  outlines  the  main  eoneepts  of  our  new  FT  teeh- 
niques  for  faults  in  logic  and  in  interconnect.  Section  6  concludes  the  report.  Section  7  lists  the 
partieipants  in  the  project,  the  patents  obtained  or  applied  for,  and  publieations.  Seetion  8  contains 
references. 
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3.  The  Roving  STARs 

3.1  The  Test  and  Reconfiguration  Controller 

Since  ACS  normal  system  operation  involves  reconfiguring  FPGAs  for  different  funetions,  the 
reeonfiguration  proeess  is  controlled  by  a  module  external  to  FPGAs;  typieally,  this  is  an  embed¬ 
ded  microprocessor  having  some  memory  to  store  configurations.  We  extend  the  tasks  of  this  pro¬ 
cessor  to  also  eontrol  the  test,  diagnosis,  and  FT  functions,  including  their  associated 
reeonfigurations.  We  refer  to  this  processor  as  the  test  and  reconfiguration  controller  (TREC). 
TREC  accesses  an  FPGA  using  its  boundary-scan  mechanism  [51],  so  that  this  access  is  transpar¬ 
ent  to  the  normal  function  of  the  ehip.  TREC  also  employs  the  boundary- scan  interface  to  par¬ 
tially  reconfigure  the  FPGA,  to  control  the  BIST,  and  to  retrieve  the  test  results.  TREC  uses  RTR 
to  rove  the  STARs  across  the  chip  and  to  reconfigure  them  for  different  test  and  diagnosis  opera¬ 
tions.  TREC  also  eontrols  the  system  clock(s)  and  has  the  capability  to  stop  the  system  operation 
for  short  intervals  to  allow  for  safe  relocation  of  the  system  logie  in  the  last  tested  area. 

If  faults  are  detected  during  BIST,  TREC  analyzes  the  results  to  locate  the  faults.  If  the  faults 
could  not  be  preeisely  diagnosed,  TREC  will  run  additional  diagnostic  configurations  until  the 
desired  resolution  is  achieved.  TREC  also  keeps  track  of  the  status  of  FPGA  resourees,  and  deter¬ 
mines  if  the  defective  hardware  located  in  the  current  STAR  affects  the  operation  of  the  next  slice 
of  the  working  area  to  be  reloeated  over  the  STAR  position.  If  the  faults  affect  only  spare 
resourees,  or  if  the  faulty  resources  may  be  safely  reused  to  implement  the  required  funetion, 
TREC  proceeds  with  the  relocation.  If  the  faulty  resources  may  not  be  reused,  then  TREC  deter¬ 
mines  eonfiguration  ehanges  to  the  working  area  so  that  the  defective  resourees  are  bypassed  and 
replaeed  by  fault-free  ones.  For  faults  affeeting  its  own  operation,  TREC  employs  mieroproces- 
sor-specific  FT  techniques  that  will  not  be  discussed  in  this  report. 

3.2  The  Roving  STARs  Structure 

Figure  2  illustrates  the  floor  plan  of  an  FPGA  at  some  stage  of  the  roving  process;  the  FPGA  has  a 
working  area  and  a  spare  area  eonsisting  of  a  vertieal  STAR  (V-STAR)  with  two  columns  and  an 
horizontal  STAR  (H-STAR)  with  two  rows.  Depending  on  the  positions  of  the  two  STARs,  the 
working  area  may  be  contiguous  or  split  in  two  of  four  regions.  Most  of  the  time  only  one  STAR 
is  aetive  (testing)  at  any  time.  A  STAR  tests  both  the  PEBs  and  the  interconneet  in  its  area.  A  PEB 
may  use  loeal  interconneet  to  eommunicate  with  its  adjaeent  eells,  and  global  interconneet  for 
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Figure  2.  Horizontal  and  Vertical  STARs 
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connections  with  other  cells  or  input/output  (I/O)  blocks.  The  latter  category  includes  long  lines, 
that  extend  horizontally  along  an  entire  row  or  vertically  along  an  entire  column.  While  only  one 
STAR  would  be  sufficient  to  test  all  the  PLBs  in  the  FPGA,  testing  of  both  vertical  and  horizontal 
global  busses  and  of  the  connections  between  them  requires  separate  vertical  and  horizontal 
STARS.  Having  two  roving  STARs  also  provides  additional  benefits  for  diagnosis  and  fault  toler¬ 
ance  that  will  be  discussed  in  the  next  sections.  Testing  the  entire  FPGA  takes  one  full  horizontal 
sweep  of  V-STAR  and  one  full  vertical  sweep  of  H-STAR.  The  direction  of  roving  changes  after 
every  full  sweep:  left  aright,  up^down,  right^left,  and  down^up.  For  an  A/XAFPGA,  a  full 
sweep  of  one  STAR  requires  NH  positions,  thus  the  total  number  of  STAR  movements  is  N.  An 
important  consequence  is  that  the  fault  latency  increases  linearly  with  the  size  of  the  FPGA. 

To  preclude  the  usage  of  the  spare  resources  by  the  FPGA  design  tools,  the  spare  PLBs  in  both 
STARs,  the  vertical  routing  tracks  in  V-STAR,  and  the  horizontal  routing  tracks  in  H-STAR  are 
designated  as  reserved.  Connections  among  the  working  PLBs  are  allowed  to  use  only  horizontal 
wire  segments  through  V-STAR,  and  only  vertical  segments  through  H-STAR;  the  same  rule  also 
applies  to  connections  with  the  I/O  blocks  of  the  FPGA.  For  an  A/XA/FPGA,  the  area  overhead 

taken  by  the  two  STARs  is  OV(N)  =  l-(V-2)^/V^.  The  overhead  decreases  with  N.  For  example, 
for  V=20,  OV  is  19  percent,  but  for  V=40,  OV  is  only  10  percent.  Another  part  of  the  overhead 
consists  of  routing  resources  for  a  four-wire  bus  around  the  perimeter  of  the  array,  needed  for  the 
links  between  the  boundary-scan  test  access  port  (TAP)  controller  and  BIST  circuitry  within  the 
STARS. 

3.3  The  Roving  Mechanism 

Once  a  STAR  has  been  tested,  the  STAR  roving  process  continues  by  relocating  the  working  area 
adjacent  to  the  STAR  over  the  current  STAR  position,  and  by  reconfiguring  the  just-released 
working  area  as  a  new  STAR.  Figure  3  illustrates  several  aspects  of  this  procedure  by  following  a 
section  of  one  row  during  several  steps.  Figure  3a  shows  the  initial  state,  where  A,  D,  E,  and  F  are 
working  PLBs,  separated  by  B  and  C  which  are  part  of  V-STAR.  Assume  all  test  activity  has 
stopped  without  detecting  any  faults,  while  the  normal  operation  continues.  TREC  performs  the 
following  operations:  (1)  Configure  B  and  C  with  the  functions  of  D  and  E;  (2)  Stop  the  system 
clock;  (3)  If  necessary,  copy  the  state  of  D  and  E  into  B  and  C;  (4)  Reconfigure  to  disconnect  the 
nets  connected  to  D  and  E  and  connect  them  to  B  and  C;  (5)  Restart  the  system  clock;  (6)  Config¬ 
ure  the  new  STAR  and  restart  testing.  The  critical  steps  are  (3)  and  (4),  which  must  be  as  fast  as 
possible  to  minimize  the  period  when  the  system  clock  is  suspended;  these  steps  are  explained 
next. 

Step  (3)  -  copying  the  state  values  from  the  old  to  the  new  locations  of  the  relocated  working 
PLBs  -  is  not  needed  for  PLBs  that  are  used  only  as  combinational  functions.  Otherwise,  if  the 
relocated  functions  include  registers  and/or  RAM,  the  implementation  of  this  step  depends  on  the 
features  of  the  target  FPGA.  For  an  FPGA  that  allows  direct  reading  and  writing  of  registers  by 
treating  them  as  addressable  memory  locations,  the  task  is  straightforward.  Although  a  little  more 
complicated,  solutions  based  on  partial  configuration  readback,  available  on  ORCA  series  3C 
[16],  are  also  easy  to  implement.  For  FPGAs  without  such  features,  a  general  solution  is  to  use  an 
additional  configuration  that  creates  temporary  transfer  paths  through  the  STAR  from  the  source 
PLBs  to  the  destination  PLBs  and  issue  a  single  clock  (derived  from  the  test  clock  TCK)  to  copy 


Figure  3.  Roving  Steps 

the  values.  Figure  3b  illustrates  a  path  T created  to  transfer  values  from  D  to  B  (we  assumed  that  E 
implements  a  combinational  function).  In  addition  to  a  transfer  path,  copying  RAM  contents 
requires  the  creation  of  a  transfer  controller  to  sequence  both  the  source  and  the  destination 
RAMs  concurrently  through  their  address  space  and  to  issue  the  proper  read  and  write  signals  for 
the  desired  transfer.  The  transfer  controller  can  be  placed  in  the  STAR  not  involved  in  the  current 
roving  step  (H-STAR  in  our  example). 

Step  (4)  maintains  the  correct  interconnections  among  working  PLBs;  the  results  of  this  step  can 
be  seen  in  Figure  3c.  Net  Q,  originally  between  F  and  D,  has  been  extended  to  connect  to  B.  In 
contrast,  net  R  has  been  shrunk  since  now  it  connects  adjacent  PLBs.  Net  P,  originally  a  direct 
connection  between  E  and  F,  has  been  routed  via  two  vertical  segments  and  a  horizontal  one.  Net 
S  has  been  moved  with  the  PLBs  it  connects. 

These  changes  are  done  by  an  incremental  rerouting  of  the  affected  area  of  the  chip,  using  the 
same  netlist,  changing  the  placement  of  the  relocated  logic,  marking  the  vertical  tracks  in  the  new 
STAR  position  as  unusable,  and  saving  the  resulting  incremental  configurations.  For  a  full  hori¬ 
zontal  swap  we  need  to  repeat  this  process  NH  times.  We  are  using  a  standard  router  for  this  job. 
Since  computing  these  roving  configurations  is  not  done  on-line,  but  as  a  preprocessing  step,  the 
cost  of  the  N  incremental  routing  jobs  is  acceptable.  A  more  efficient  incremental  routing  proce¬ 
dure  is  described  by  Dutt  et  al.  in  [10],  but  this  requires  a  special-purpose  router. 

Inserting  a  STAR  in  the  working  area  introduces  a  delay  overhead  in  the  signals  that  now  have  to 
traverse  the  STAR,  typically  corresponding  to  a  two-cell  long  wire  segment.  Although  relatively 
small,  this  overhead  must  be  compensated  by  a  slack  in  the  operating  clock  frequency.  In  the 
worst  case,  a  signal  in  the  application  circuit’s  critical  path  may  be  split  by  one  or  two  STAR 
widths.  To  investigate  this  performance  penalty,  we  used  seven  benchmark  circuits.  First  we 
mapped  the  circuit  as  a  non-FT  FPGA  design  using  commercial  computer-aided  design  (CAD) 
tools,  with  options  set  for  performance  optimization.  Then  we  mapped  the  same  set  of  circuits  to 
the  same  FPGA  with  two  inserted  STARs.  The  results  presented  in  Table  1  are  based  on  imple¬ 
menting  the  seven  benchmarks  with  ORCA  2C15A,  and  show  the  number  of  PLBs,  the  initial 
maximum  delay,  the  maximum  delay  in  the  presence  of  STARs,  and  the  percent  increase.  The 
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maximum  delays  were  computed  for  the  10  positions  of  the  V-STAR.  For  six  circuits,  the  delay 
increase  caused  by  the  insertion  of  the  STARs  is  between  0  and  16.3  percent. 


Table  1 .  Performance  Penalty  with  STARs 


Circuit 

#PLBs 

Initial 
Delay  (sec) 

Delay  w/ 
STARS  (sec) 

%  diff 

Huffman 

40 

87.5 

98.9 

11.5 

Fibonacci 

70 

130.7 

130.8 

0.0 

Wallace  tree  multiplier 

65 

144.3 

149.7 

3.6 

Digital  Single  Sideband  Modulator 

229 

75.1 

76.6 

2.0 

Hilbert 

192 

73.2 

87.5 

16.3 

Random  number  generator 

134 

49.4 

56.4 

11.5 

Mono-FFT 

62 

117.8 

120.8 

2.5 
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4.  On-Line  Testing  and  Diagnosis 


4.1  Basic  BIST  Structure 


A  STAR  contains  several  disjoint  tiles  that  are  tested  concurrently.  Figure  4  shows  the  basic  BIST 
structure  used  in  a  tile,  ealled  a  BISTER,  which  inherits  some  of  the  coneepts  used  in  our  off-line 
FPGA  BIST  methods  for  PLBs  [56,  55,  2]  and  programmable  interconnect  [57],  A  BISTER  con¬ 
tains  a  TPG  applying  test  patterns  to  two  identically  configured  blocks  under  test  (BUTs)  in  ease 
of  logie  BIST,  or  two  sets  of  wires  under  test  (WUTs)  in  the  ease  of  routing  BIST.  The  outputs  of 
the  BUTs  or  WUTs  are  compared  by  an  ORA.  The  ORA  latches  and  reports  mismatches  as  test 
failures.  Direct  comparison  avoids  possible  aliasing  errors  introdueed  by  compressing  responses 
into  signatures  [1].  BIST  Start/Reset  slyiA  Pass/Fail  are  two  of  the  interface  signals  with  the  TREC 
via  the  boundary-scan  access  mechanism.  Start/Reset  is  used  to  initiate  the  BIST  sequence  and  to 
reset  the  TPG  and  ORA.  The  test  result  Pass/Fail  is  eaptured  in  a  FF  whieh  is  part  of  a  sean  regis¬ 
ter  [18].  The  other  two  inputs  from  the  TREC/boundary-sean  interfaee  include  TCK  for  clocking 
the  BIST  circuitry  and  a  control  input  for  scanning  out  the  Pass/Fail  results  from  each  BISTER. 
The  first  eonfiguration  of  a  BISTER  cheeks  the  proper  operation  of  the  sean  register,  indueing 
mismatches  by  comparing  BUTs  or  WUTs  with  different  eonfigurations.  This  also  proteets 
against  the  case  of  all  ORA  FFs  being  stuck  at  the  “Pass”  value. 


BIST  Start/Reset 


Figure  4.  Basic  BISTER 


The  TREC  aeeesses  an  FPGA  using  its  boundary-sean  meehanism  [51],  so  that  this  aeeess  is 
transparent  to  the  normal  function  of  the  chip  (most  FPGAs  support  boundary  scan).  TREC 
employs  the  boundary-scan  interface  to  reconfigure  the  STARs  for  different  test  and  diagnosis 
operations,  to  initiate  the  BISTERs,  and  to  sean  out  the  test  results.  BISTERs  work  with  the 
boundary-scan  clock  TCK,  and  the  configuration  clock  is  also  derived  from  TCK.  TREC  also  con¬ 
trols  the  system  clock(s)  and  has  the  capability  to  stop  the  system  operation  for  short  intervals  to 
allow  for  safe  reloeation  of  the  system  logie  in  the  last  tested  area.  If  faults  are  detected,  TREC 
starts  the  diagnosis  process  (to  be  discussed  later). 

Figure  5  illustrates  a  typical  ORA  that  compares  four  pairs  of  outputs  coming  from  the  two  BUTs 
(or  WUTs).  The  feedback  latches  the  first  mismatch  into  the  FF.  Higher  diagnostic  resolution 
could  be  obtained  by  constructing  an  independent  ORA  (comparison  and  latching  FF)  for  each 
pair  of  outputs  to  be  compared.  However,  this  would  increase  the  Pass/Fail  result  data  to  be 
retrieved  by  a  faetor  of  four.  While  higher  diagnostie  resolution  is  important  for  the  FT  tech¬ 
niques,  there  is  a  tradeoff  between  diagnostic  resolution  and  the  total  testing  time  which,  in  turn, 
determines  the  fault  latency.  Therefore,  to  minimize  fault  latency,  we  break  the  fault  detection  and 
fault  diagnosis  into  two  separate  proeesses  when  neeessary.  We  apply  diagnosties  proeedures  only 
after  we  have  detected  faults  in  the  STAR.  Accordingly,  we  break  our  discussion  of  the  fault 


11 


Figure  5.  Integrated  ORA/Scan  cell 

detection  and  diagnostic  processes  into  the  following  separate  subsections  for  both  logic  and 
interconnect. 

4.2  Testing  Programmable  Logic  Blocks 

Figure  6  shows  a  typical  structure  of  a  PLB,  composed  of  a  memory  module,  a  register,  and  a 
combinational  output  module.  This  structure  is  found  in  most  current  RAM-based  FPGAs.  The 
memory  block  can  implement  combinational  lookup  tables  (LUTs)  or  a  RAM  with  various  modes 
of  operation  (synchronous,  asynchronous,  single-port,  dual-port,  etc.).  A  PLB  also  contains  spe¬ 
cial-purpose  logic  for  arithmetic  functions  (counters,  adders,  multipliers,  etc.)  The  register  can  be 
configured  as  FFs  or  latches  with  programmable  clock-enable,  preset/clear,  and  data  selector 
functions. 


Figure  6.  Typical  PLB  Structure 


The  BUTs  are  repeatedly  configured  to  be  pseudoexhaustively  tested  in  every  mode  of  operation; 
this  means  that  every  module  in  a  BUT  is  exhaustively  tested  [39].  The  configuration  of  the  TPG 
also  changes  when  a  new  BUT  configuration  requires  different  patterns.  For  example,  to  test  com¬ 
binational  functions  implemented  by  a  LUT  with  n  inputs,  the  TPG  is  a  counter  that  generates  all 

possible  2”  vectors,  while  to  test  the  RAM,  the  TPG  is  a  state  machine  generating  standard  RAM 
test  sequences  [59],  which  are  known  to  be  exhaustive  for  the  fault  models  specific  to  RAMs,  such 
as  pattern-sensitive  faults.  The  ORA  needs  to  be  reconfigured  when  the  new  mode  of  operation  of 
BUT  involves  a  different  number  of  outputs. 

A  configuration  multiplexer  (MUX)  is  a  commonly  used  hardware  mechanism  that  selects  subcir¬ 
cuits  for  various  modes  of  operation.  A  configuration  MUX  is  controlled  by  configuration  mem¬ 
ory  bits  to  select  one  input  to  be  connected  to  its  output.  In  Figure  7a,  assume  that  we  set  the 
configuration  bit  5  to  0  to  connect  VO  to  X.  Then  the  subcircuit  producing  VI  disappears  from  the 
circuit  model  seen  by  the  user.  This  is  correct  from  a  design  viewpoint,  because  the  value  VI  can 
no  longer  affect  A /«  the  current  configuration.  But  from  a  testing  viewpoint,  in  any  test  for  the 
MUX,  we  need  to  set  VO  and  VI  to  complementary  values.  In  general,  for  a  MUX  with  k  inputs,  if 
V  is  the  value  of  the  selected  input,  all  the  other  k-\  inputs  should  be  set  to  value  V. 
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Figure  7.  Configuration  Multiplexer 


The  problem  arises  beeause  FPGA  CAD  tools  generate  the  configuration  bitstream  based  on  the 
user  model,  which  will  never  include  the  functionally  inactive  subcircuits  (called  invisible  logic  in 
[56]).  Thus  in  Figure  7a,  when  S=0,  VO  will  be  set  to  both  0  and  1,  but  VI  cannot  change.  Simi¬ 
larly,  the  user  logic  cannot  control  VO  in  any  configuration  where  5'=1 .  The  result  is  that  the  testing 
of  the  MUX  may  not  be  complete.  For  example,  the  s-a-1  fault  in  the  gate-level  MUX  model  in 
Figure  7b  is  detected  only  when  S=0,  V0=0,  and  VJ=1.  But  this  pattern  may  never  be  applied  if  VJ 
cannot  be  controlled  when  S=0. 

Our  solution  relies  on  separately  configuring  the  invisible  logic  so  that  it  will  generate  the  proper 
values  needed  for  the  inactive  MUX  inputs.  Then  we  overlay  the  resulting  configuration  files  over 
the  main  configuration  file  with  the  active  logic,  and  we  merge  them  without  changing  any  MUX 
setting  done  in  the  main  configuration.  This  process  is  conceptually  simple,  but  its  implementa¬ 
tion  requires  knowledge  of  the  FPGA  configuration  stream  structure. 

Note  that  in  most  previous  work  dealing  with  testing  FPGAs,  the  problem  of  testing  a  configura¬ 
tion  MUX  is  either  not  addressed  or  it  is  solved  functionally,  by  connecting  every  input  in  turn  to 
the  output,  and  providing  both  0  and  1  values  to  the  selected  input.  However,  the  invisible  logic 
driving  the  inactive  inputs  is  completely  ignored.  Hence  prior  claims  of  complete  testing  may  not 
be  valid  since  the  testing  of  every  configuration  MUX  in  the  FPGA  is  likely  to  be  incomplete. 
Methods  that  did  provide  complete  tests  for  configuration  multiplexers,  such  as  [27,  55],  used 
models  that  did  not  remove  the  invisible  logic.  But  such  models  cannot  be  used  with  the  existing 
FPGA  CAD  tools  to  generate  configuration  files. 

Since  a  BISTER  provides  complete  testing  only  for  its  BUTs,  we  have  to  reconfigure  every  BIS¬ 
TER  several  times  so  that  every  PEB  will  be  a  BUT  in  at  least  one  configuration.  The  BUTs  are 
repeatedly  configured  to  be  tested  in  all  their  modes  of  operation.  The  modes  of  operation  of  a 
PEB  may  be  determined  only  from  the  information  available  in  the  FPGA  data  book,  without  hav¬ 
ing  a  detailed  knowledge  of  its  implementation.  The  test  time  for  a  BISTER,  and  thus  total  roving 
time,  is  dominated  by  the  reconfiguration  time,  which  is  much  larger  than  the  BIST  time.  Hence  to 
reduce  the  latency  of  our  procedure,  we  try  to  minimize  the  number  of  PLBs  in  a  BISTER.  The 
number  of  PLBs  for  an  ORA  and  for  a  TPG  depend  on  the  target  PEB  architecture.  Usually  the 
number  of  outputs  in  a  PEB  is  smaller  than  its  number  of  inputs.  Since  a  TPG  must  provide 
exhaustive  patterns  for  BUTs,  we  will  need  more  than  one  PEB  to  construct  the  TPG.  For  illustra¬ 
tion  purposes,  we  will  assume  that  a  TPG  needs  three  PLBs  and  an  ORA  only  one.  (Our  analysis, 
however,  is  independent  of  the  number  of  PLBs  used  in  implementation.) 
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Figure  8.  3  by  2  BISTER  Tile  Rotations 

Figure  8  illustrates  six  floor  plans  of  a  3  by  2  BISTER  tile,  where  T,  B,  and  O  denote,  respectively, 
a  TPG  cell,  a  BUT,  and  an  ORA  cell.  The  goal  of  the  six  configurations  is  to  systematically  rotate 
the  functions  of  the  PLBs,  so  that  eventually  every  PLB  in  the  tile  is  completely  tested  twice,  each 
time  being  compared  with  a  different  BUT.  This  rotating  strategy  assures  that  every  single  faulty 
PLB  and  almost  all  combinations  of  multiple  faulty  PLBs  (except  for  a  few  pathological  cases 
described  in  [2])  are  guaranteed  to  be  detected.  If  the  number  of  PLBs  in  a  STAR  is  not  a  multiple 
of  the  number  of  PLBs  in  a  tile,  then  the  leftover  PLBs  that  could  not  make  up  a  BISTER  will  be 
grouped  with  some  of  the  PLBs  already  tested  in  the  adjacent  tile,  so  that  every  PLB  in  the  STAR 
will  eventually  be  part  of  a  BISTER. 


Figure  9  illustrates  the  floor  plan  of  the  4  by  2  BISTER  tile  used  in  our  implementation  with  the 
ORCA  2C  series  FPGA  and  its  associated  eight  test  configurations  (S  indicates  a  spare  PLB).  The 
role  of  the  spare  cells  is  to  provide  additional  routing  resources  to  overcome  routing  congestion 
problems.  We  arrange  the  number  of  spare  PLBs  so  that  a  tile  will  have  an  even  number  of  PLBs, 
symmetrically  distributed  between  the  two  columns  of  the  V-STAR.  Similar  to  the  3  by  2  BISTER 
tile,  the  eight  configurations  shown  in  Figure  9  systematically  rotate  the  functions  of  the  PLBs,  so 
that  every  PLB  in  the  tile  is  completely  tested  twice,  each  time  being  compared  with  a  different 
BUT.  To  overcome  routing  congestion  problems,  spare  PLBs  may  be  incorporated  into  the 
BISTER  tile  to  provide  additional  routing  resources.  While  these  are  not  pure  rotations,  since  they 
maintain  the  3  TPG  cells  in  the  same  column,  they  do  achieve  the  same  property  as  the  rotations 
of  the  3X2  tile.  This  rotating  strategy  allows  us  to  prove  the  following  properties  regarding  the 
fault  detection  capability  of  a  BISTER  [4]: 
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Figure  9.  4  by  2  BISTER  Tile  Rotations 


Claim  1:  Any  single  faulty  PLB  is  guaranteed  to  be  detected  in  at  least  two  BISTER  configura¬ 
tions. 

Proof:  The  faulty  PLB  is  a  BUT  in  two  BISTER  configurations,  where  its  exhaustive  inputs  pat¬ 
terns  are  produced  by  a  fault-free  TPG,  and  its  outputs  are  compared  with  a  fault-free  BUT  by  a 
fault-free  ORA.  Hence  no  fault  (single  or  multiple)  detected  in  the  BUT  can  escape  detection  in 
these  two  configurations. 
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Claim  2:  Except  for  a  few  pathological  cases,  any  pair  of  faulty  PLBs  is  guaranteed  to  be  deteeted 
in  at  least  one  BISTER  configuration. 

Proof:  Since  any  single  faulty  PLB  is  detected,  we  must  have  a  eireular  masking  relation  [1], 
where  one  faulty  PLB  masks  the  deteetion  of  the  faults  in  the  other  and  viee  versa.  This  masking 
should  occur  in  any  BISTER  configuration  where  a  single  faulty  PLB  would  be  detected.  First  we 
note  that  any  masking  relation  between  the  two  faulty  PLBs  disappears  whenever  one  of  them 
becomes  a  spare.  Because  a  TPG  or  an  ORA  containing  a  faulty  PLB  may  still  work  correctly,  we 
analyze  only  configurations  when  at  least  one  of  the  faulty  PLBs  is  a  BUT  and  the  other  is  not  a 
spare. 

Case  1:  Both  faulty  PLBs  are  BUTs  in  one  BISTER  eonfiguration.  Then  the  TPG  and  the  ORA 
are  fault-free,  and  the  only  way  the  faulty  pair  can  escape  detection  is  to  have  functionally  equiva¬ 
lent  faults,  since  then  their  identieal  outputs  will  not  mismatch.  Assume  that  the  faulty  PLBs  are 
BUTs  in  the  configuration  1  in  Figure  10.  Then  in  configurations  3  and  5,  one  of  them  is  a  BUT 
(compared  with  a  fault-free  BUT  by  a  fault-free  ORA),  and  the  other  is  a  TPG  cell.  The  only  way 
the  faulty  pair  can  escape  deteetion  is  for  the  faulty  TPG  to  skip  exactly  the  patterns  that  detect  the 
faulty  BUT.  We  say  that  this  is  a  pathologieal  ease,  because  it  has  an  extremely  low  probability  of 
occurrence:  we  need  two  faulty  PLBs,  and  they  must  have  functionally  equivalent  faults,  and 
when  a  faulty  PLB  is  part  of  the  TPG,  the  TPG  must  skip  those  patterns  that  detect  the  faulty 
BUT. 


Figure  10.  BISTER  Tile  Rotations  (Two  Faulty  BUTs) 

Case  2:  At  most  one  faulty  PLB  is  a  BUT  in  any  BISTER  configuration.  Assume,  without  loss  of 
generality,  that  the  faulty  cells  are  in  the  first  row  of  the  BISTER,  as  shown  in  Figure  11.  Let  Abe 
the  faulty  BUT  and  Y  the  faulty  TPG  cell  in  configuration  1 .  To  have  eireular  masking  between  A 
and  Y,  all  of  the  following  conditions  must  be  true:  1)  in  configuration  1,  the  TPG  must  skip  those 
patterns  that  deteet  A;  2)  in  eonfiguration  4,  the  TPG  must  skip  those  patterns  that  deteet  T;  3)  in 
eonfiguration  2,  the  ORA  Amust  not  record  the  mismateh  between  Y  and  the  fault-free  BUT;  and 
4)  in  configuration  3,  the  ORA  Y  must  not  record  the  mismatch  between  A  and  the  fault- free  BUT. 
Again,  this  is  a  pathological  situation  because  we  need  the  AND  of  four  conditions,  which  are 
very  unlikely  by  themselves. 


Figure  11.  BISTER  Tile  Rotations  (One  Faulty  BUT) 

Clearly,  similar  arguments  can  be  made  for  more  than  two  faulty  PLBs.  Hence  we  can  conclude 
that: 
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Claim  3:  In  practice,  any  combination  of  faulty  PLBs  is  detected  in  at  least  one  BISTER  configu¬ 
ration. 


Table  2  summarizes  the  14  test  phases  we  have  developed  to  completely  test  a  PLB  in  our  imple¬ 
mentation  of  the  roving  STARs  approaeh  using  the  ORCA  2C  and  2CA  series  FPGAs  [16].  The 
last  five  phases  apply  only  to  the  2CA  series  PLB.  The  FF  modes  can  be  tested  at  the  same  time 
with  five  of  the  LUT  modes  to  reduee  fault  latency  (as  illustrated  in  Table  2),  or  we  ean  test  them 
separately  to  enhance  diagnostic  resolution.  We  ehose  to  minimize  fault  lateney  in  our  implemen¬ 
tation  at  the  expense  of  diagnostic  resolution.  Thus  complete  testing  of  a  PLB  requires  14  BIS¬ 
TER  eonfigurations.  One  additional  BISTER  configuration  is  added  to  eaeh  set  of  test  phases  to 
test  the  integrated  ORA/ scan  chain  for  a  total  of  15  test  phases  per  rotation  of  the  BISTER. 


Table  2:  Summary  of  BIST  Phases  for  ORCA  2C  and  2CA  Logic  Blocks 


Phase 

FF/Latch  Modes  &  Options 

LUT 

No. 

Logic  Tested  | 

No. 

FF/Latch 

Set/Reset 

Clock 

Clock 

Enable 

FF  Data 

Mode 

Outs 

LUT 

FFs 

MUX 

1 

- 

- 

- 

- 

- 

Asynchro¬ 
nous  RAM 

4 

V 

V 

2 

- 

- 

- 

- 

- 

Adder/sub¬ 

tracter 

5 

V 

V 

3 

- 

- 

- 

- 

- 

5 -variable 
MUX 

4 

V 

V 

4 

- 

- 

- 

- 

- 

5 -variable 
XOR 

4 

V 

V 

5 

FF 

async.  reset 

falling  edge 

active  low 

LUT  output 

Count  up 

5 

V 

V 

6 

FF 

async.  set 

falling  edge 

enabled 

PLB  input 

Count  up/ 
down 

5 

V 

V 

V 

7 

Latch 

sync,  set 

active  low 

active  high 

LUT  output 

Count  down 

5 

V 

V 

V 

8 

FF 

sync,  reset 

rising  edge 

active  low 

PLB  input 

4-variable 

4 

V 

V 

V 

9 

Latch 

- 

active  high 

active  low 

dynamic  select 

4-variable 

4 

V 

V 

10 

- 

- 

- 

- 

- 

Multiplier 

5 

11 

- 

- 

- 

- 

- 

Greater/equal 
to  Comp 

5 

V 

12 

- 

- 

- 

- 

- 

Not  equal  to 
Comp 

5 

V 

13 

- 

- 

- 

- 

- 

Synchronous 

RAM 

4 

V 

14 

- 

- 

- 

- 

- 

Dual  port 
RAM 

4 

V 

The  operational  modes  of  the  PLBs  require  between  9  and  12  inputs  be  supplied  with  test  pat¬ 
terns.  Since  a  4-bit  counter  can  be  easily  implemented  in  a  single  PLB,  we  use  three  PLBs  for  a 
TPG.  An  ORA  eonsisting  of  a  4-bit  eomparator  and  a  FF  to  lateh  mismatches  is  implemented  by  a 
single  PLB.  Figure  12  shows  the  actual  layout  of  a  BISTER  tile  in  the  H-STAR.  Two  spare  PLBs 
(denoted  by  S/0  and  SP),  configured  as  identity  functions,  have  been  added  to  avoid  the  use  of 
global  vertical  routing  resources.  The  layout  also  shows  the  boundary-scan  TAP  controller 
(denoted  by  BS  TAP),  which  is  not  part  of  the  BISTER  tile  [18].  Since  the  tile  has  8  PLBs,  we  will 
have  8  rotations  of  this  layout  (see  Figure  9),  and  every  BISTER  will  be  reconfigured  15  times  for 
a  total  of  120  eonfigurations  per  STAR  to  eompletely  test  all  of  the  PLBs  within  that  STAR. 
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Figure  12.  Layout  of  a  BISTER  Tile 


Partial  reconfiguration  reduces  the  time  required  for  downloading  the  BISTER  tile  configurations 
(and  therefore  the  fault  latency),  and  it  also  significantly  reduces  the  amount  of  storage  required 
for  the  BISTER  tile  configurations.  For  example,  the  complete  configuration  memory  for  the 
2C15A  consists  of  about  225  Kbits.  Partial  configuration  of  the  complete  STAR  requires  about  23 
Kbits.  However,  by  performing  partial  configuration  for  only  those  portions  of  the  STAR  that 
must  be  changed  from  one  BISTER  tile  configuration  to  the  next  requires  only  between  3  Kbits 
and  10  Kbits,  depending  on  how  much  of  the  STAR  is  being  reconfigured;  changing  the  mode  of 
operation  of  the  BUTs  requires  the  smallest  number  of  bits,  while  rotating  the  BISTER  tile  to 
begin  testing  a  new  set  of  BUTs  requires  the  largest  number  of  bits.  As  a  result,  only  about  7  per¬ 
cent  of  the  files  are  of  the  larger  size  (10  K),  while  93  percent  of  the  files  are  the  smaller  size  (3 
K),  for  a  total  of  about  420  K  bits  of  storage  required  for  the  PLB  BIST  configurations. 

4.3  Divide-and-Conquer  Adaptive  Diagnosis 

The  goal  of  our  original  approach  was  to  have  a  unique  procedure  able  to  locate  any  combination 
of  multiple  faulty  PLBs  in  a  BISTER  without  making  any  restrictive  assumption  about  the  multi¬ 
plicity  of  the  fault.  We  start  by  considering  all  the  PLBs  in  a  failing  BISTER  as  suspects. 

Figure  13  illustrates  this  adaptive  diagnosis  procedure.  The  V-STAR  has  a  failing  BISTER  with 
six  suspected  PLBs.  First  they  are  split  in  two  sets  -  {A,B}  and  {C,D,E,F}  -  which  are  included  in 
separate  BISTERs  whose  other  PLBs  were  shown  to  be  fault  free.  Next  we  assume  that  the  upper 
BISTER  passes  all  its  tests,  while  the  lower  one  fails.  The  remaining  suspects  are  again  split  in 
two  sets  -  {C,D}  and  {E,F}  -  and  included  in  separate  BISTERs.  Next  we  assume  that  the  E  and 
F  are  found  to  be  fault  free.  But  now  the  remaining  suspects  -  C  and  D  -  reside  in  the  same  row 
and  may  not  be  included  in  separate  BISTERs.  To  complete  the  diagnosis,  we  rove  H-STAR  to 
bring  it  over  the  row  with  C  and  D,  which  now  may  be  tested  in  separate  horizontal  BISTERs. 
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Figure  13.  Example  of  Adaptive  PLB  Diagnosis 


The  time  for  the  diagnosis  procedure  is  determined  by  the  number  of  partial  reconfigurations 
needed  to  locate  all  the  faulty  PLBs.  Even  in  the  best  case,  when  we  have  a  single  faulty  PLB,  we 
need  at  least  three  different  STAR  configurations,  and  for  each  one  of  them,  the  full  set  of  six  or 
eight  rotations.  In  addition,  several  roving  steps  may  be  needed  to  bring  the  second  STAR  to  cover 
the  remaining  suspected  PLBs,  where  each  roving  step  is  done  by  another  partial  reconfiguration. 
Long  run-time  for  diagnosis  is  not  critical,  because  the  normal  system  operation  continues  with¬ 
out  interruption  during  this  period  [15].  However,  it  may  result  in  delaying  the  detection  of  a  new 
fault  occurring  during  the  execution  of  the  diagnosis  procedure. 

A  second  problem  with  this  procedure  results  from  the  need  to  group  suspected  PLBs  with  adja¬ 
cent  known-good  ones.  During  the  life  of  a  fault-tolerant  system,  the  set  of  fault-free  spares  gets 
gradually  smaller  as  more  faults  occur,  and  fault-free  PLBs  may  no  longer  exist  in  an  area  where 
they  are  needed  to  help  in  diagnosing  suspects.  In  such  a  situation  the  procedure  would  be  no 
longer  applicable. 

To  illustrate  a  third  shortcoming  of  the  design-and-conquer  technique,  consider  the  situation 
shown  in  Figure  14,  where  we  have  the  suspects  A,  B,  C,  and  D.  Recall  that  in  [2]  H-STAR  and  V- 
STAR  move  by  two  PLBs  each  roving  step  and  thus  each  STAR  can  only  be  positioned  in  one  of 
N/2  two-cell-wide  positions  in  the  FPGA.  Then  in  V-STAR  we  can  place  {A,B}  and  {C,D}  in  sep¬ 
arate  BISTERs,  and  in  H-STAR  we  can  separate  between  {A,C}  and  {B,D}.  But  if  all  these  four 
BISTERs  fail,  we  cannot  separate  the  suspects.  Indeed  both  diagonal  pairs  of  PLBs  -  {A,D}  and 
{B,C}  -  would  cause  failures  in  all  four  BISTERs.  Similarly,  all  sets  of  three  faulty  PLBs  and  the 
group  of  all  four  faulty  PLBs  in  the  2  by  2  area  located  at  the  intersection  of  the  V-STAR  and  H- 
STAR  are  indistinguishable.  This  problem  could  be  solved  by  allowing  STARs  to  have  one-PLB- 
wide  roving  steps  for  diagnosis,  but  this  solution  would  double  the  number  of  configurations  to  be 
stored  by  TREC,  and  significantly  complicate  the  control  programs. 


Ill  iiiii  iiii 


Figure  14.  Example  of  Inconclusive  Diagnosis 
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4.4  Locating  Faulty  PLBs  Using  BISTER  Results 


In  every  test  configuration,  TREC  records  the  set  of  failures  obtained  at  the  ORA  outputs.  Our 
improved  diagnosis  method  consists  of  two  stages.  First,  it  tries  to  locate  faulty  PLBs  based  on 
analyzing  the  test  results,  and  in  many  cases,  this  stage  is  sufficient  to  achieve  maximum  diagnos¬ 
tic  resolution.  The  frequent  case  of  a  single  faulty  PLB  falls  in  this  category.  For  cases  of  multiple 
faulty  cells  not  completely  resolved  by  the  first  stage,  a  second  stage  employs  the  divide-and-con- 
quer  strategy  based  on  additional  reconfigurations  [4,  3]. 

In  any  fault  location  procedure,  maximum  diagnostic  resolution  is  achieved  when  faults  are  iso¬ 
lated  within  an  equivalence  class  containing  all  the  faults  that  produce  the  observed  response.  If 
every  equivalence  class  has  only  one  fault,  we  say  that  the  fault  is  uniquely  diagnosed,  that  is, 
there  is  no  other  fault  which  can  produce  the  same  response.  In  our  case,  a  fault  is  one  faulty  PLB 
that  may  have  several  internal  faults,  and  the  response  is  the  set  of  failing  test  configurations  (also 
called  test  phases)  detected  at  the  outputs  of  the  ORAs.  We  begin  by  assuming  a  single  faulty  PLB 
in  the  LPGA,  then  we  analyze  the  case  of  multiple  faulty  PLBs,  and  we  also  discuss  locating 
faults  inside  a  defective  PLB. 

We  can  observe  that  the  PLBs  in  a  BISTER  tile  can  be  partitioned  into  two  disjoint  sets,  so  that 
every  set  contains  the  PLBs  that  are  pairwise  compared  when  configured  as  BUTs.  For  example, 
for  the  3  by  2  tile,  the  BUTs  in  configurations  1,3,  and  5  are  compared  only  among  themselves 
and  they  are  never  compared  to  the  BUTs  in  phases  2, 4,  and  6.  A  similar  partition  exists  for  the  4 
by  2  tile  as  well.  We  can  represent  the  relations  between  BUTs  that  are  pairwise  compared  and  the 
ORAs  that  observe  them  by  the  graphs  shown  in  Figure  15,  where  a  BUT  is  denoted  by  a  node  5,-, 
and  the  ORA  that  observes  BUTs  Bi  and  Bj  is  denoted  by  Oy.  We  can  view  such  a  graph  as  repre¬ 
senting  a  combined  test  session,  in  which  half  of  the  BUTs  in  the  tile  are  concurrently  tested.  Each 
tile  is  completely  tested  in  two  combined  test  sessions,  where  the  BUTs  in  one  session  are  ORAs 
in  the  other  one,  and  vice  versa. 

Theorem  1:  Any  single  faulty  PLB  is  guaranteed  to  be  uniquely  diagnosed. 

Proof:  When  the  faulty  PLB  is  configured  as  a  BUT,  it  is  detected  at  its  two  adjacent  ORAs,  and 
no  other  BUT  is  detected  at  the  same  two  ORAs.  Note  that  when  the  faulty  PLB  is  configured  as  a 
TPG  cell,  no  error  is  generated  even  when  the  fault  modifies  the  TPG  patterns,  because  the  two 


Figure  15.  Combined  test  sessions  a)  3  by  2  tile  b)  4  by  2  tile 
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BUTs  still  receive  the  same  patterns.  When  configured  as  an  ORA,  the  faulty  PLB  may  generate 
an  error,  but  this  will  be  in  addition  to  the  failures  observed  in  the  combined  session  when  it  is  a 
BUT,  and  these  results  are  sufficient  to  distinguish  it  from  any  other  single  faulty  PLB. 

As  a  consistency  check,  we  can  verify  that  the  sets  of  failures  obtained  at  the  two  ORAs  are  iden¬ 
tical. 

The  following  results  deal  with  the  location  of  a  group  G  of  faulty  PLBs  that  is  detected  in  at  least 
one  test  session.  To  uniquely  diagnose  G  means  to  identify  all  its  faulty  blocks  such  that  no  other 
group  (including  subsets  of  G)  can  produce  the  same  result.  Although  unique  diagnosis  is  not 
always  possible,  there  are  many  situations  when  it  can  be  guaranteed. 

We  will  analyze  the  interactions  among  faulty  PLBs  under  the  following  assumptions: 

Assumption  1:  A  TPG  with  faulty  PLBs  does  not  skip  the  patterns  that  detect  faults  in  a  BUT. 

Assumption  2:  A  faulty  PLB  is  not  detected  when  it  is  configured  as  an  ORA. 

Based  on  these  assumptions,  the  set  of  failing  phases  obtained  at  ORA  Oy  is  given  by  the  follow¬ 
ing: 

FOy  =  {FB.yjFB.)-Feqy  (1) 

where  FBj^  is  the  set  of  failing  phases  of  BUT  5^,  and  Peq^  is  the  set  of  failing  phases  of  both  5, 
and  Bj  that  have  identical  responses  (and  thus  do  not  cause  mismatches  at  0,y).  In  Figure  16,  the 
area  of  FOy  is  marked  by  diagonal  lines.  Note  that  FOy  is  empty  (0 )  when  both  B^  and  Bj  are  fault 
free,  or  when  the  faults  in  5,-  and  Bj  are  equivalent  (since  then  =  fb^  =  Feq^  ).  Also  note  that 
there  exists  one  situation  when  FOy  is  the  same,  no  matter  if  only  one  or  both  of  the  BUTs 
observed  at  Oy  are  faulty;  this  occurs  if  the  two  BUTs  have  the  same  sets  of  failing  phases 
( FB^  =  FBj ),  but  their  faulty  responses  are  never  the  same  {Feqy  =  0 ). 


Figure  16.  Figure  for  Equation  (1) 

Knowing  the  set  of  failing  phases  observed  at  Oy  and  the  complete  set  of  failing  phases  of  one  of 
the  two  faulty  BUTs,  we  can  determine  the  set  of  failing  phases  where  the  two  BUTs  have  identi¬ 
cal  responses  by  the  following: 

Feqy  =  FB.-FOy  =  FB.-FOy  (2) 

Based  on  FOy  and  one  of  the  two  sets,  we  can  also  compute  a  lower  bound  on  the  other  set  by 

FB.^(FOy-FBj)^Feqy  (3) 
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FBj  3  {FOjj  -  FB.)  u  Feqij 


(4) 


or  by 


Note  that  Equation  (3)  becomes  an  equality  when  FOy  and  FBj  are  disjoint: 

FB,=  FOyUFeqy  (5) 

This  occurs  when  FB^  c  fBj  and  Feq^  =  FB^  . 

For  the  diagnosis  procedure,  we  will  make  one  more  assumption: 

Assumption  3:  No  more  than  two  faulty  BUTs  have  identical  responses  in  the  same  failing  phase. 

To  justify  this  assumption,  recall  that  in  every  configuration  the  TPG  applies  exhaustive  tests  for 
that  mode  of  operation.  If  Assumption  3  is  not  satisfied,  it  means  that  we  would  have  three  or 
more  BUTs  with  equivalent  faults  for  that  mode  of  operation,  which  is  quite  an  unlikely  event. 

The  following  results  will  be  used  by  our  diagnosis  procedure. 

Lemma  1:  If  none  of  the  two  ORAs  observing  BUT  B  fails  in  phase  p,  then  B  does  not  fail  phase 
P- 

Proof:  Assume,  by  contradiction,  that  B  fails  phase  p.  Based  on  Assumption  1,  the  vectors  detect¬ 
ing  its  faults  are  generated  by  the  TPG.  Then  the  other  two  BUTs  observed  by  the  two  ORAs  must 
have  equivalent  faults  to  the  fault  of  B  in  phase  p,  because  p  is  not  reported  by  either  of  the  two 
ORAs.  But  then  we  would  have  three  BUTs  with  equivalent  faults  in  the  same  phase,  which  con¬ 
tradicts  Assumption  3.  Therefore,  B  does  not  fail  phase  p. 

Lemma  2:  If  the  two  ORAs  observing  the  same  BUT  report  no  failures,  then  their  common  BUT 
is  fault  free. 

Proof:  Based  on  Lemma  1,  the  common  BUT  of  the  two  ORAs  does  not  fail  any  phase. 

Lemma  3:  Any  BUT  failure  appears  at  least  at  two  ORAs. 

Proof:  Assume,  by  contradiction,  that  phase  p  is  reported  only  at  one  ORA.  Let  B  be  the  BUT  fail¬ 
ing  p,  and  O  be  the  other  ORA  observing  B.  Since  O  does  not  report  p  as,  a  failure,  the  other  BUT 
(say  C)  observed  by  O  must  have  identical  response  in  p.  But  the  other  ORA  observing  C  does  not 
report  pas  a  failure  either,  so  we  have  more  than  two  BUTs  with  equivalent  faults  in  p,  which  con¬ 
tradicts  Assumption  3.  Therefore,/?  must  appear  at  least  two  ORAs. 

Lemma  4:  If  we  have  failures  only  at  two  ORAs,  they  must  have  identical  failures. 

Proof:  Based  on  Lemma  3,  any  failure  must  appear  at  least  twice,  and  in  this  case  we  have  only 
two  failing  ORAs.  Hence  their  failures  must  be  identical. 
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Theorem  2:  (For  the  4  by  2  tile)  If  only  two  ORAs  observing  the  same  BUT  report  failure  in 
phase  p,  their  common  BUT  fails  in  phase  p. 

Proof:  In  Figure  17,  without  loss  of  generality,  assume  that  O12  and  O14  are  the  two  failing  ORAs 
reporting  as  a  failure.  Assume,  by  contradiction,  that  their  common  BUT  B j  does  not  fail  phase 
p.  Then  both  B2  and  B4  must  fail  p.  But  since  no  other  ORA  reports  p,  then  B^  must  also  fail  in 
phase  p,  and  its  response  must  be  identical  to  that  of  ^2  and  B4.  Hence  we  would  have  three  BUTs 
with  equivalent  faults  in  phase  p,  which  contradicts  Assumption  3.  Therefore,  B  must  fail  phase  p. 


Figure  17.  Figure  for  Theorem  2 

Lemma  5:  If  two  ORAs  observing  the  same  BUT  report  no  failures,  then  all  three  BUTs  observed 
by  them  are  fault  free. 


Proof:  Based  on  Lemma  2,  the  common  BUT  of  the  two  ORAs  is  fault  free.  So  each  one  of  the 
other  BUTs  is  compared  with  a  fault- free  one  by  an  assumed  fault- free  ORA,  and  no  failure  is 
reported.  Hence  the  other  two  BUTs  are  also  fault  free. 


Lemma  6:  (For  the  4  by  2  tile)  If  only  two  ORAs  without  a  common  BUT  fail  phase  p,  then  at 
least  one  pair  of  BUTs  between  the  two  ORAs  are  faulty  and  have  identical  response  in  phase  p. 


Proof:  In  Figure  18,  without  loss  of  generality,  assume  that  0/2  and  0^4  are  the  two  ORAs  with¬ 
out  a  common  BUT  failing  phase  p.  At  0^2,  P  may  be  caused  by  Bj  or  B2.  Let  us  assume  that  B y 
fails  p.  Because  p  is  not  observed  at  B4  must  also  fail  p  with  an  identical  response.  Similarly, 
had  we  assumed  that  B2  fails  p,  we  would  have  concluded  that  B2  and  B^,  have  equivalent  faults  in 
phase  p.  Note  that  all  four  BUTs  may  be  faulty  as  well,  but  in  this  case  the  equivalent  faults  in  the 
two  pairs  will  be  different. 


Figure  18.  Figure  for  Lemma  6 
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Note  that  under  the  eondition  of  Lemma  6,  we  eannot  reach  a  unique  diagnosis.  However,  we  can 
use  the  reduced  set  of  suspects  as  a  starting  point  for  the  divide-and-conquer  technique  to  test 
some  of  the  suspected  PLBs  in  separate  BISTER  tiles.  A  subsequent  example  will  show  how  the 
combined  technique  can  uniquely  diagnose  the  faulty  PLBs. 

Lemma  7:  Assume  that  two  out  of  three  ORAs  fail,  such  that  the  middle  one  has  no  failures,  and 
the  other  two  have  disjoint  failures.  Then  the  two  BUTs  observed  by  the  nonfailing  ORA  are  fault 
free  and  the  other  two  BUTs  are  faulty. 

Proof.  In  Figure  19,  without  loss  of  generality,  assume  that  has  no  failures,  while  0 14  and  O25 
have  disjoint  failures.  Then  we  want  to  prove  that  R5  and  B4  are  fault  free  and  Bj  and  B 2  are  faulty. 


Figure  19.  Figure  for  Lemma  7 

Assume,  by  contradiction,  that  one  of  the  two  BUTs  observed  by  the  nonfailing  ORA  (say,  Bf 
fails  phase  p.  Since  0^4  reports  no  failures,  B4  must  have  an  equivalent  fault  in p.  At  most  one  of 
Oj4  and  O22  may  fail  p,  because  their  failures  are  disjoint.  Then  we  have  to  analyze  two  cases: 

Case  1:  One  of  the  failing  ORAs  (say,  6)25)  shows  p  asa  failure.  Since  p  is  not  observed  at  Oj4, 
while  B4  fails  p,Bj  must  also  have  an  equivalent  fault  in  p.  In  this  case  we  would  have  three  BUTs 
with  equivalent  faults  in  p,  which  would  contradict  Assumption  3. 

Case  2:  None  of  the  failing  ORAs  reports  pas  a  failure.  But  this  would  mean  that  both  R;  and  B2 
must  also  have  equivalent  faults  in p,  since  p  does  not  appear  at  O44  or  O25.  In  this  case  we  would 
have  four  BUTs  with  equivalent  faults  in  p,  which  would  contradict  Assumption  3. 

Therefore,  both  B^  and  B4  are  fault  free.  Then  B2  is  the  source  of  the  failures  recorded  at  O25, 
while  those  observed  at  0 14  originate  at  Ry. 

As  a  consistency  check,  we  can  verify  that  the  failures  recorded  at  the  ORA  between  the  faulty 
BUTs  are  the  union  of  the  disjoint  sets  of  failures  obtained  at  the  other  two  failing  ORAs  (for  the 
example  above,  fOj2  =  fOj^  u  fo^j  ). 

The  following  examples  deal  with  the  results  of  only  one  combined  test  session.  Under  the 
assumption  that  a  faulty  PLB  is  detected  only  when  configured  as  a  BUT,  the  two  sessions  can  be 
independently  analyzed,  and  a  different  group  of  faulty  BUTs  may  be  identified  in  each  session. 
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Example  1:  Figure  20  shows  the  set  of  failing  phases  obtained  at  the  four  ORAs  in  one  eombined 
test  session  for  a  4  by  2  BISTER  tile.  Because  failures  in  test  phase  1  are  reported  only  at  O25  and 
0^4,  then  ^5  is  faulty  and  fails  in  phase  1  (by  Theorem  2).  In  the  same  manner,  B4  is  identified  as 
failing  phase  5, is  identified  as  failing  phase  9,  and  B2  is  identified  as  failing  phase  7.  Since 
failing  phase  8  is  reported  at  three  ORAs,  Theorem  2  does  not  apply  and  we  cannot  determine 
which  BUTs  fail  phase  8  (should  be  at  least  two).  Note  that  we  have  identified  all  four  BUTs  as 
faulty. 

5,8,9 
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Figure  20.  Figure  for  Example  1 

If  our  goal  is  also  to  identify  the  exact  set  of  failing  phases  for  every  faulty  PLB  (to  be  able  to 
reuse  its  fault-free  modes  of  operation  as  a  partially  usable  block),  we  can  use  the  divide-and-con- 
quer  technique  to  obtain  additional  information.  Let  us  assume  that  we  test  ^5  in  a  separate  BIS¬ 
TER  with  known  fault- free  PLBs  and  we  find  out  that  FB^  =  { 1,8}.  Then  from  Equation  (2)  we 
obtain  Feq24  =  { l,8}-{  1,5}  =  {8},  and  from  Equation  (4)  we  determine  that  B4 must  also  fail 
phase  8.  At  this  point  we  still  do  not  know  whether  5;  or  ^2  is  causing  0/2  to  report  failure  in 
phase  8,  so  we  again  use  the  divide-and-conquer  technique  and  test  in  a  separate  BISTER.  Let 
us  assume  that  we  get  FBi  =  {9}.  Then  from  Equation  (2)  we  obtain  Feqi2  =  0,  and  from  Equa¬ 
tion  (4)  we  determine  that  B4  must  also  fail  phase  8.  Now  we  know  the  exact  sets  of  failing  phases 
for  every  BUT. 

Note  that  the  original  divide-and-conquer  technique  by  itself  cannot  diagnose  this  quadruple  fault, 
because  no  matter  how  the  original  tile  is  divided,  any  BISTER  will  contain  one  faulty  PLB,  and 
the  initial  set  of  eight  suspected  PLBs  will  never  be  reduced.  In  contrast,  our  analysis  technique 
found  all  of  the  four  faulty  PLBs  without  any  reconfiguration,  and  relied  on  the  information  pro¬ 
vided  by  two  steps  of  divide-and-conquer  only  to  determine  the  exact  sets  of  failures  for  the  faulty 
PLBs. 

Example  2:  Assume  that  we  have  FO22  =  FO24  =  {3,5}  and  no  failures  at  O42  and  O14.  Based  on 
Lemma  5,  we  conclude  that  Bj,  B2,  and  B4  are  fault  free,  and  from  Theorem  2  we  determine  that 
^2  is  failing  phases  3  and  5. 

This  example  has  illustrated  the  common  pattern  of  a  single  faulty  PLB. 

Example  3:  Assume  that  we  have  F025={2,5},  FO;.^={3,4},  and  no  failures  at  0^4  (see 
Figure  21).  Because  FO22  and  FOj4  are  disjoint  while  0^4  reports  no  failures,  the  conditions  of 
Lemma  7  are  satisfied,  and  we  can  conclude  that  B^  and  B4  are  fault  free  and  Bj  and  B2  are  faulty. 
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Moreover,  from  Equation  (1)  we  find  that  FB /={2,5 }  and  F52={3,4} .  As  a  eonsisteney  check,  we 
can  verify  that  FOj2  ={2,3,4, 5}. 

3,4 


0 

Figure  21.  Figure  for  Example  3 

Example  4:  Assume  that  we  have  F02s={4},  FO;2={3},  F054={3,4},  and  no  failures  at  O14  (see 
Figure  22).  Since  only  two  ORAs  with  a  common  BUT  fail  phase  4,  from  Theorem  2  we  deter¬ 
mine  that  ^5  is  faulty  failing  phase  4.  Because  phase  3  is  a  common  failure  for  two  ORAs  without 
a  common  BUT,  from  Lemma  6  we  know  that  at  least  one  of  the  pairs  {81^4}  or  {82^2) 
identical  failures  in  phase  3.  This  is  not  a  unique  diagnosis,  so  we  select  one  of  the  four  suspects, 
say  B4,  to  be  tested  in  a  separate  BISTER,  and  let  us  assume  that  we  find  that  84  is  fault  free. 
From  Lemma  6  it  follows  that  is  also  fault  free,  and  then  82  and  8 2  must  be  the  faulty  PLBs 

with  identical  responses  in  phase  3. 

0 
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Figure  22.  Figure  for  Example  4 

Note  how  the  combined  use  of  failing  BIST  phases  along  with  the  divide-and-conquer  technique 
allowed  us  to  uniquely  diagnose  three  faulty  PLBs  that  cannot  be  diagnosed  by  either  technique 
when  used  by  itself. 

The  PLB  test  and  diagnosis  approach  was  applied  to  three  known-faulty  ORCA  2C15A  FPGAs 
from  Lucent  Microelectronics  that  had  been  previously  tested  with  the  off-line  approaches  pre¬ 
sented  in  [55]  and  [57].  One  of  the  FPGAs  was  found  to  contain  a  logic  fault  consistent  with  the 
diagnosis  result  obtained  in  [55].  The  results  of  the  test  phases  for  both  combined  test  sessions  are 
illustrated  in  Figure  23,  where  the  failing  test  phases  from  Table  2  are  given  for  each  session.  In 
Figure  23a,  test  phases  5  through  9  fail  with  failure  indications  given  at  two  ORAs  comparing  a 
common  BUT.  Based  on  Theorem  2,  we  conclude  that  B4  is  faulty.  In  Figure  23b,  we  have  failures 
only  at  one  ORA,  which,  at  face  value,  would  indicate  an  inconsistent  result  as  it  contradicts 
Lemma  3.  But  in  this  case,  we  know  from  the  first  session  that  this  ORA  is  faulty,  and  these 
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a)  Test  session  1  b)  Test  session  2 

Figure  23.  Diagnosis  of  Actual  Faulty  PLB 


results  show  that  Assumption  2  does  not  hold  in  this  case.  Thus  the  diagnosis  of  as  the  single 
faulty  PLB  is  consistent. 

4.5  Diagnosis  within  a  Faulty  PLB 

During  each  test  session,  the  TREC  records  all  ORA  failures.  This  allows  us  to  identify  the  failing 
mode(s)  of  operation  of  the  faulty  PLB  and  its  faulty  internal  module(s).  For  example,  lines  1 
through  9  in  Table  2  summarize  the  test  phases  developed  for  the  ORCA  2C  series  FPGA,  while 
lines  10  through  14  describe  the  additional  modes  of  operation  tested  for  ORCA  2CA  series. 
Phases  1  through  4  and  10  through  15  are  used  to  test  the  different  modes  of  operation  of  the 
LUT/RAM  module  of  the  PLB,  while  phases  5  through  9  are  used  to  test  the  FFs  and  their  modes 
of  operation.  Let  us  assume  that  a  PLB  fails  only  phases  5,  6,  and  7.  Since  this  faulty  PLB  passed 
the  exhaustive  tests  for  all  its  other  modes  of  operation,  it  may  be  safely  used  for  any  function  that 
does  not  involve  counting  (since  phases  8  and  9  pass,  the  FFs  work  correctly  as  registers).  This  is 
the  concept  of  a  partially  usable  block  [2,  15],  where  such  a  defective  block  is  reused  whenever 
possible  in  its  fault-free  modes  of  operation,  to  increase  the  effective  spare  capacity  of  the  FPGA 
in  FT  applications.  If  phases  5  through  9  are  the  only  ones  that  fail,  the  LUT  module  can  still  be 
used  as  a  RAM  or  to  perform  any  combinational  logic  function. 

When  the  BIST  results  do  not  clearly  indicate  which  component(s)  of  a  faulty  PLB  is  at  fault, 
additional  test  phases  are  added  to  further  isolate  and  identify  the  fault(s)  and  obtain  even  higher 
diagnostic  resolution.  Table  3  presents  a  list  of  additional  diagnostic  phases  for  the  ORCA  2C15A 
FPGA.  Phases  1  through  10  are  used  to  test  the  output  multiplexer,  ensuring  that  each  output  from 
the  LUTs  and  the  registers  can  reach  each  output  of  the  output  multiplexer.  Phases  1 1  through  14 
can  be  used  to  isolate  and  test  the  registers  in  their  FF  mode  of  operation,  while  phases  15  through 
18  can  be  used  to  test  the  registers  in  their  LATCH  mode  of  operation. 

In  Figure  5,  the  ORA  FF  stores  the  combined  result  of  comparing  four  pairs  of  BUT  outputs. 
Additional  diagnostic  resolution  can  be  obtained  by  storing  the  result  of  comparing  each  pair  of 
outputs  separately.  In  this  way  we  can  determine  which  LUT  output  or  which  FF  is  faulty. 

Figure  24  illustrates  this  ORA  implementation,  which  is  feasible  in  FPGAs  such  as  the  Xilinx 
4000  series  [61]  and  ORCA  3C  series  [35].  The  PLB  architecture  of  other  FPGAs,  such  as  the 
ORCA  2C  series  [35],  allows  comparing  only  two  pairs  of  BUT  outputs  per  FF;  in  this  case,  we 
can  locate  the  fault  to  only  one  out  of  two  LUTs  or  FFs. 
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Table  3:  Summary  of  Additional  Diagnostic  Phases  for  ORCA  2C  and  2CA 


Phase 

FF/Latch  Modes  &  Options 

LUT 

No. 

Logic  Tested  | 

No. 

FF/Latch 

Set/Reset 

Clock 

Clock 

Enable 

FF  Data 

Mode 

Outs 

LUT 

FFs 

MUX 

1-4 

Latch 

- 

active  high 

enabled 

PLB  input 

- 

4 

V 

- 

- 

- 

- 

- 

I-variable 

4 

V 

mm 

FF 

async.  set 

falling  edge 

active  low 

PLB  input 

- 

4 

B 

■El 

async.  reset 

rising  edge 

active  high 

PLB  input 

- 

4 

B 

in 

sync,  set 

rising  edge 

active  low 

PLB  input 

- 

4 

B 

■a 

sync,  reset 

falling  edge 

active  high 

PLB  input 

- 

4 

B 

13 

Latch 

async.  set 

active  low 

active  low 

PLB  input 

- 

4 

V 

14 

Latch 

async.  reset 

active  high 

active  high 

PLB  input 

- 

4 

B 

15 

Latch 

sync,  set 

active  high 

active  low 

PLB  input 

- 

4 

B 

16 

Latch 

sync,  reset 

active  low 

active  high 

PLB  input 

- 

4 

B 
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> 

- 
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1 

Figure  24.  ORA  with  Greater  Diagnostic  Resolution 


The  additional  diagnostic  configurations  shown  in  Table  3  were  applied  to  the  known-faulty 
ORCA  2C15A  PLB  illustrated  in  Figure  23  that  had  been  previously  tested  with  the  approaches 
presented  in  [55]  and  [57].  Two  of  the  FPGAs  -  Chip  1  and  Chip  2  -  exhibited  routing  faults  con¬ 
sistent  with  the  results  in  [57],  while  the  third  faulty  FPGA  -  Chip  3  -  was  found  to  contain  a 
logic  fault  consistent  with  the  diagnosis  result  obtained  in  [55].  The  results  of  the  diagnostic 
phases  indicate  that  the  fault  in  this  PLB  only  affects  the  four  FFs,  leaving  the  lookup  tables  and 
output  multiplexer  logic  unaffected.  Specifically,  the  fault  appears  to  be  located  in  or  affecting  a 
global  resource  to  the  FFs  such  as  the  clock  input,  global  reset/preset,  or  clock  enable  signal.  This 
defective  PLB  can  be  and  has  been  successfully  reused  as  a  PUB  to  implement  combinational 
logic  functions  in  a  FT  application  [15]. 

Diagnosis  within  a  PLB  to  determine  whether  a  LUT  or  a  FF  is  faulty  and  which  LUT(s)  or  FF(s) 
are  faulty  requires  additional  BIST  configurations  and  testing  sequences.  Once  the  normal  BIST 
and  diagnostic  configurations  for  PLBs  has  completed  and  the  faulty  PLBs  are  identified,  special 
PUB  diagnostic  configurations  are  generated  on  the  fly.  The  PUB  diagnostic  configurations  are 
generated  for  the  two  or  more  rotations  that  will  put  the  PLBs  identified  as  faulty  under  test.  The 
sliding  BISTER  tile  used  for  the  divide  and  conquer  diagnosis  procedure  is  then  used  to  test  the 
faulty  PLBs.  The  actual  PUB  diagnostic  configurations  are  different  from  those  used  to  test  the 
PLBs  during  the  normal  on-line  BIST  in  that  one  set  tests  the  LUTs  in  conjunction  with  the  output 
switch  box  while  another  set  tests  the  FFs  in  conjunction  with  the  output  switch  box.  There  are 
eight  LUT  PUB  diagnostic  configurations,  four  of  which  are  programmed  with  a  four-input  exclu- 
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sive-OR  function  while  the  other  four  are  programmed  with  a  four-input  exclusive-NOR  funetion. 
During  each  of  the  two  sets  of  four  LUT  configurations  for  PUB  diagnosis,  the  outputs  of  the 
LUTs  are  rotated  through  the  four  outputs  of  the  switeh  box.  In  this  way,  if  the  error  follows  the 
rotation,  the  fault  is  known  to  be  in  a  LUT,  and  the  faulty  LUT  is  identified  as  illustrated  in  the 
example  of  a  faulty  LUT  in  Figure  25. 


Figure  25.  PUB  Diagnosis. 

If  the  error  does  not  rotate,  on  the  other  hand,  then  the  switch  box  is  faulty  and  the  faulty  output  is 
identified.  The  same  rotation  strategy  is  used  for  the  FF  PUB  diagnostic  configurations  as  can  be 
seen  in  Figure  25  by  substituting  FF  for  LUT.  Any  single  faulty  LUT  or  FF  can  be  identified  and 
any  combination  of  a  single  faulty  LUT  and  single  faulty  FF  can  be  identified  with  the  current 
ORA  design  in  the  ORAC  2C  FPGA.  With  the  PLB  architecture  of  more  recent  ORCA  PLBs  (like 
the  4C  series  FPGA),  any  combination  of  multiple  faulty  LUTs  and  FFs  in  the  PLB  can  be  identi¬ 
fied.  This  same  level  of  diagnostic  resolution  ean  be  obtained  with  the  ORCA  2C  but  at  the 
expense  of  24  PUB  diagnostic  configurations  instead  of  12  configurations. 

Once  the  PUB  diagnosis  has  completed  for  a  given  BISTER  tile,  the  diagnostic  procedure  moves 
on  to  the  next  BISTER  tile  identified  as  having  faulty  PLBs  until  all  PLBs  that  have  been  identi¬ 
fied  as  faulty  with  in  a  STAR  have  undergone  PUB  diagnosis.  While  this  is  the  procedure  in  the 
current  implementation,  PUB  diagnosis  of  multiple  faulty  BISTER  tiles  can  be  performed  in  par¬ 
allel  at  the  expense  of  slightly  more  complieated  diagnostic  software  in  the  TREC.  When  PUB 
diagnosis  has  been  completed  within  a  STAR,  the  faulty  logic  resources  within  each  PLB  are 
passed  to  the  logic  FT  software  to  be  used  to  determine  whether  a  given  PLB  can  be  utilized  as  a 
PUB  by  the  next  system  function  to  occupy  that  position. 
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4.6  Testing  Interconnect 


The  programmable  interconnect  network  in  most  FPGAs  consists  of  wire  segments  that  can  be 
connected  via  configuration  interconnect  points  (CIPs).  The  basic  CIP  structure  consists  of  a 
transmission  gate  controlled  by  a  configuration  memory  bit  (Figure  26a).  There  are  three  types  of 
CIPs  which  we  refer  to  as  the  crosspoint  CIP  (Figure  26b),  the  breakpoint  CIP  (Figure  26c),  and 
the  multiplexer  CIP  or  MUX  CIP  (Figure  26d)  [35].  While  a  crosspoint  CIP  connects  wire  seg¬ 
ments  located  in  disjoint  planes  (a  horizontal  segment  with  a  vertical  one),  a  breakpoint  CIP  con¬ 
nects  two  wire  segments  in  the  same  plane.  The  multiplexer  CIP  is  a  group  of  2*  crosspoint  CIPs 
sharing  a  common  output  wire  and  controlled  by  k  configuration  bits,  such  that  at  any  time  only 
the  input  wire  addressed  by  the  configuration  bits  is  connected  to  the  output  wire  (in  Figure  26d 
A=l).  Wire  segments  within  the  programmable  interconnect  network  are  bounded  by  these  CIPs;  a 
given  segment  begins  at  a  CIP  and  ends  at  another  CIP. 


wire  A  wire  B 

- LJ - 


wire  B 


wire  A 


wire  A 


configuration 
memory  bit 


o 


wire  B 


wire  A 


c)  breakpoint  CIP 


wire  B 

output 


a)  basic  CIP  structure 


b)  crosspoint  CIP 


d)  multiplexer  CIP 


Figure  26.  CIP  Structure  and  Types 


Wire  segments  within  the  FPGA  are  either  global  or  local  routing  resources.  While  global  routing 
resources  are  used  to  interconnect  nonadjacent  PLBs  and  programmable  I/O  cells,  local  routing 
resources  are  specific  to  a  given  PLB  and  are  used  to  connect  that  PLB  to  global  routing  resources 
or  adjacent  PLBs.  Figure  27a  illustrates  a  simplified  view  of  the  global  routing  busses  associated 
with  a  single  PLB  in  an  ORCA  2C  series  FPGA,  which  provides  similar  routing  resources  to  that 
of  the  Xilinx  4000  series  FPGA  [61].  Horizontal  and  vertical  busses  are  denoted  by  h  and  v, 
respectively.  The  suffixes  xl,  x4,  xH,  and  xL  indicate  wire  segments  that  extend  across  1  PLB,  4 
PLBs,  half  the  PLB  array,  and  the  full  length  of  PLB  array,  respectively,  before  encountering  a 
breakpoint  CIP.  The  direct  busses  provide  connections  between  adjacent  PLBs;  we  denote  the 
four  direct  busses  as  dn,  ds,  de,  and  dw  denoting  directions  north,  south,  east,  and  west,  respec¬ 
tively.  For  every  PLB  there  are  two  sets  of  vertical  xl  busses  and  two  sets  of  horizontal  xl  busses, 
designated  vxlw,  vxle,  hxln,  and  hxls.  Not  visible  in  Figure  27  is  the  rotation  undergone  by  the 
wires  of  the  x4  and  xL  busses  as  they  pass  by  each  PLB  in  the  array.  Many  CIPs  are  available  to 
establish  different  connections  among  the  wire  segments.  The  diamond  symbol  of  a  breakpoint 
CIP  on  a  4-bit  bus  represents  a  group  of  4  individual  breakpoint  CIPs.  The  circle  denoting  a  cros¬ 
spoint  CIP  at  the  intersection  of  a  vertical  4-bit  bus  with  an  horizontal  4-bit  bus  represents  a  group 
of  4  individual  crosspoint  CIPs  between  corresponding  wires  in  the  two  busses,  as  illustrated  at 
the  upper  right  side  of  Figure  27a.  The  square  at  the  intersection  of  a  5-bit  direct  bus  with  a  4-bit 
xl  bus  represents  a  more  flexible  matrix  of  crosspoint  CIPs,  illustrated  at  the  upper  left  side  of 
Figure  27a.  The  local  routing  associated  with  each  PLB  is  illustrated  in  Figure  27b,  which 
includes  multiplexer  CIPs  at  the  inputs  to  the  PLB  and  bidirectional  buffers  with  connections  to 
the  vertical  and  horizontal  xH  and  xL  busses,  as  well  as  the  four  direct  busses.  Here  the  MUX 
CIPs  select  one  of  four  busses  for  connection  to  the  PLB  inputs.  Although  the  ORCA  2C15  FPGA 
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Figure  27.  Single  PLB  View  of  Routing  Resources 

is  a  relatively  small  FPGA,  it  nevertheless  contains  over  25,000  wire  segments  and  over  100,000 
CIPs. 

As  mentioned  above,  the  Xilinx  4000  series  routing  architecture  is  similar  in  many  respects  to  that 
of  the  ORCA  2C  series  FPGA  illustrated  in  Figure  27.  For  example,  the  Xilinx  4000  series  FPGA 
has  xl,  x2,  xL,  and  direct  busses.  The  x2  busses  also  rotate  at  each  PLB  location.  The  array  of 
CIPs  illustrated  at  both  the  upper  left  and  upper  right  comers  of  Figure  27a  are  present  in  the  Xil¬ 
inx  FPGA.  There  are  MUX  CIPs  in  the  Xilinx  FPGA  with  the  principle  difference  from  the 
ORCA  2C  series  being  that  the  MUX  CIPs  access  any  segment  in  a  bus  adjacent  to  the  PLB  in  one 
direction  (either  north,  south,  east,  or  west),  while  OCRA  2C  MUX  CIPs  access  a  subset  of  the 
segments  in  each  north,  south,  east,  and  west  bus.  Xilinx  4000  series  FPGAs  also  incorporate 
bidirectional  buffers  for  high  fanout  signals  or  signals  traveling  long  distances  across  the  FPGA 
the  same  as  in  ORCA.  Aside  from  the  topological  differences  in  their  data  book  drawings  of  the 
interconnect  stmcture,  there  is  not  much  difference  between  ORCA  and  Xilinx  interconnect  stmc- 
tures.  As  a  result,  the  BIST  approach  we  describe  here  can  be  applied  to  either  FPGA  as  well  as 
any  similar  interconnect  architecture. 
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4.6.1  Faults  in  FPGA  Interconnect 


The  fault  model  we  consider  is  typical  for  FPGA  interconnect  structures  [45]:  CIPs  stuck-closed 
(stuck-on)  and  stuck-open  (stuck-off),  wires  stuck  at  0  or  1,  open  wires,  and  shorted  wires. 
Detecting  the  CIP  faults  also  detects  stuck-at  faults  in  the  configuration  memory  bits  that  control 
the  CIPs.  For  generality,  we  allow  both  wired-AND/wired-OR  and  dominant  bridging  fault  mod¬ 
els  as  possible  behavior  for  shorts.  A  stuck-closed  CIP  creates  a  short  between  its  two  wires.  We 
assume  that  no  detailed  layout  information  is  available  regarding  adjacency  relations  between  seg¬ 
ments.  Instead,  we  use  only  rough  physical  data  (available  in  FPGA  data  books)  to  determine 
bunches  of  wires,  where  a  bunch  is  a  group  of  wires  that  may  have  pair-wise  shorts,  but  not  every 
wire  is  necessarily  adjacent  with  every  other  wire  in  the  bunch,  and  wires  in  different  bunches  are 
not  adjacent.  For  example,  all  the  vertical  wires  located  between  two  adjacent  PLB  columns  in 
Figure  27a  could  be  treated  as  one  bunch,  even  if  not  all  shorts  may  be  physically  feasible.  This 
treatment  makes  our  method  layout  independent  and  allows  us  to  ignore  the  bus  rotations,  which 
make  the  adjacency  relations  among  the  wires  of  the  same  bunch  change  along  a  row  or  a  column 
of  the  FPGA.  Although  a  short  between  a  vertical  and  an  horizontal  segment  is  not  physically  pos¬ 
sible,  a  crosspoint  CIP  stuck-on  has  an  equivalent  effect. 

To  detect  the  faults  described  above,  the  applied  tests  must  check  that  every  wire  segment  and  CIP 
is  able  to  transmit  both  a  0  and  a  1,  and  that  every  pair  of  wire  segments  that  may  be  shorted  can 
transmit  both  (0,1)  and  (1,0)  values.  A  test  that  applies  0  and  1  values  at  one  end  of  a  wire  and 
expects  the  same  values  at  the  other  end,  detects  any  stuck-open  fault  in  any  type  of  closed  CIP 
(crosspoint,  breakpoint,  or  multiplexer)  along  the  wire,  and  also  any  stuck-at  fault  affecting  any 
wire  segment.  Stuck-on  CIP  faults  require  that  opposite  logic  values  be  applied  to  both  wire  seg¬ 
ments  associated  with  a  open  CIP,  so  that  a  stuck-on  fault  will  yield  an  incorrect  logic  value  on 
one  of  the  two  wire  segments.  Since  only  the  value  on  the  wire  segment  which  is  part  of  a  WUT  is 
observed,  all  four  combinations  of  values  must  be  applied  to  the  two  wire  segments  separated  by 
the  open  CIP,  to  cover  both  the  AND  and  the  OR  types  of  short. 

A  multiplexer  CIP  requires  one  test  configuration  for  each  one  of  its  inputs.  Both  0  and  1  are 
applied  to  the  selected  input,  while  opposite  values  must  be  applied  to  the  non-selected  inputs. 
This  test  detects  the  stuck-off  fault  in  the  closed  CIP  that  connects  the  selected  input  to  the  output 
wire,  and  the  stuck-on  faults  in  the  open  CIPs  corresponding  to  the  unselected  inputs.  This  also 
provide  a  complete  test  for  the  selection  logic  of  the  multiplexer  [57].  Testing  MUX  CIPs  is  simi¬ 
lar  to  the  invisible  logic  problem  [56,  2].  The  problem  arises  because  FPGA  CAD  tools  generate 
the  configuration  bitstream  based  on  the  user  model,  which  will  never  include  the  functionally 
inactive  subcircuits,  or  inactive  inputs  to  MUX  CIPs  in  this  case.  The  result  is  that  the  testing  of 
the  MUX  CIPs  may  not  be  complete.  To  ensure  that  all  interconnect  is  properly  tested,  the  routing 
must  be  done  semimanually,  without  the  typical  FPGA  CAD  tools,  such  that  opposite  logic  values 
can  be  routed  to  the  inactive  inputs  of  the  MUX. 

4.6.2  BISTER  for  FPGA  Interconnect 

The  architecture  of  a  single  BISTER  within  a  STAR  is  illustrated  in  Figure  28  and  consists  of  a 
counter-based  TPG,  two  groups  of  WUTs,  and  one  or  more  comparison-based  ORAs.  A  WUT 
may  be  composed  of  several  wire  segments  connected  by  closed  CIPs.  To  check  local  intercon- 
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nect,  WUTs  may  also  go  through  PLBs  configured  as  identity  functions  (passing  their  inputs 
directly  to  their  outputs).  In  the  illustration  in  Figure  28,  the  WUTs  are  shown  by  bold  lines,  and 
the  aetivated  (elosed)  CIPs  are  shown  in  gray,  while  the  open  CIPs  are  white.  The  first  group  of 
WUTs  conneets  the  segments  P,  Q,  R,  and  S,  while  the  second  group  connects  X,  Y,Z,V,U,  and  W. 
Since  Y  and  V  connect  to  inputs  of  PLBs,  these  PLBs  are  configured  as  identity  functions  that  just 
pass  data  from  inputs  to  outputs.  Note  that  without  involving  PLBs  in  configuring  WUTs,  one 
cannot  achieve  a  eomplete  interconneet  test.  The  TPG  applies  exhaustive  patterns  to  the  WUTs, 

that  is,  all  possible  2”  test  patterns  to  every  group  of  n  WUTs.  While  walking  patterns  (walking  a 
1  through  a  field  of  Os  and  a  0  through  a  field  of  Is)  are  known  to  provide  a  eomplete  set  of  tests 
[34],  generating  exhaustive  patterns  with  a  n-bit  counter  requires  less  PLBs  than  generating  the 
two  n-bit  walking  patterns,  which  are  already  contained  in  the  exhaustive  set.  For  large  n,  we 
divide  the  n  WUTs  into  groups  of  k<n  wires,  and  apply  exhaustive  patterns  to  one  group  at  a  time, 
while  setting  the  other  n-k  wires  to  constant  values,  so  that  eventually  all  required  pairs  of  values 
will  be  applied.  The  open  CIPs  whieh  isolate  the  WUTs  from  the  rest  of  the  interconneet  must  be 
tested  for  stuck-closed  faults.  To  achieve  this,  the  TPG  also  controls  any  wire  that  may  become 
shorted  to  some  WUT  {A,  B,  C,  D,  and  E  in  Figure  28)  such  that  when  the  TPG  drives  a  0(1)  on 
the  WUTs,  it  must  also  set  A,  B,  C,  D,  and  E  to  1(0)  at  least  once  during  the  test  of  the  WUTs. 


Figure  28.  Basic  BIST  Structure  for  Interconnect 

A  swapper  cell  may  be  needed  as  part  of  the  BISTER  to  align  the  compared  signals  before  reach¬ 
ing  the  ORA.  The  need  for  the  swapper  cell  arises  due  to  the  nature  of  the  logic  equations  imple¬ 
mented  in  the  LUTs  of  the  ORA,  routing  limitations  on  the  inputs  to  the  PLB  implementing  the 
ORA,  and  bus  rotations  in  the  WUTs  such  as  the  x4  and  xL  busses  in  ORCA  or  the  x2  busses  in 
Xilinx  4000  FPGAs.  As  illustrated  in  Figure  29,  the  swapper  is  simply  a  combinational  function 


Figure  29.  Bus  Realignment 
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programmed  to  map  input  signals  to  output  signals  to  achieve  the  desired  alignment  of  the  WUTs 
at  the  ORA  inputs,  so  that  signals  al  and  a2  are  compared  with  bl  and  b2  in  the  LUT  fed  by  the 
PLB  inputs  A0-A3  while  signals  aO  and  a3  are  compared  with  signals  bO  and  b3  in  the  LUT  fed 
by  the  PLB  inputs  B0-B3. 

We  use  comparison-based  ORAs,  which  do  not  suffer  from  aliasing  problems  that  affect  methods 
based  on  compression,  such  as  signature  analysis.  Figure  30  illustrates  an  ORA  comparing  wires 
from  each  set  of  WUTs  being  tested  by  the  BISTER.  The  FF  stores  the  result  of  the  comparison, 
and  the  feedback  loop  latches  the  first  mismatch  in  the  FF.  The  Pass/Fail  result  FFs  of  all  ORAs 
are  connected  to  form  the  scan  chain  [18].  In  every  test  phase,  the  scan  chain  is  also  tested,  to 
assure  the  integrity  of  the  test  results.  Here  an  ORA  compares  the  patterns  propagating  on  two 
groups  of  WUTs,  which  should  be  identical  if  their  propagation  is  not  affected  by  faults.  The  only 
condition  that  will  make  this  scheme  fail  would  be  a  multiple  fault  consisting  of  identical  (or 
equivalent)  faults  in  the  two  compared  groups  (for  example,  both  groups  having  a  short  between 
the  third  and  fourth  wires).  Although  this  is  not  very  likely,  we  overcome  the  problem  by  testing 
each  set  of  WUTs  twice  and  comparing  a  given  set  of  WUTs  with  two  different  sets  of  WUTs  (a 
different  set  each  time  it  is  tested). 


Scan 


WUT^ 

WUTB 


(from  previous  ORA) 

Scan  Mode - 

TCK - 


Figure  30.  Integrated  ORA/scan  cell 

Testing  the  crosspoint  CIPs  connecting  global  horizontal  and  vertical  routing  busses  is  a  difficult 
problem,  since  most  global  horizontal  routing  resources  in  V-STAR,  and  similarly,  most  global 
vertical  routing  resources  in  H-STAR,  may  be  used  by  the  system  signals  connecting  the  two 
working  areas  separated  by  the  STAR.  Our  solution  is  to  construct  a  BISTER  that  spans  both  V- 
STAR  and  H-STAR,  as  illustrated  in  Figure  31.  To  test  all  these  CIPs  would  require  such  a  BIS¬ 
TER  for  every  possible  pair  of  positions  of  H-STAR  and  V-STAR,  which  means  0(N^)  configura¬ 
tions.  To  avoid  such  a  significant  increase  in  the  fault  latency,  we  adopt  the  following 
compromise:  we  test  only  a  subset  of  these  CIPs  on  every  full  horizontal  sweep,  but  making  sure 
that  the  subset  is  different  on  every  sweep.  That  is,  we  keep  the  H-STAR  fixed  and  we  test  only 
the  subset  of  CIPs  in  the  current  intersection  of  H-STAR  and  V-STAR,  but  for  every  full  horizon- 
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Figure  31.  Testing  global  crosspoint  CIPs 
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tal  sweep  of  the  V-STAR,  the  position  of  H-STAR  is  different.  This  progression  of  testing  and  rov¬ 
ing  of  the  STARS  is  illustrated  in  Figure  32  for  an  FPGA  consisting  of  an  8  by  8  array  of  PLBs. 


13  14  15  16  17  18 


Figure  32.  Roving  Order  for  Testing  Interconnect 


The  V-STAR  and  H-STAR  are  used,  respectively,  to  test  vertical  and  horizontal  routing  resources. 
In  addition,  several  configurations  are  needed  to  test  different  bunches  of  WUTs  within  a  STAR. 
Once  a  BISTER  detects  a  failure  in  interconnect,  diagnostic  configurations  are  downloaded  to 
repeatedly  subdivide  the  suspected  WUTs  into  two  subsets  that  will  be  included  in  new  BISTERs 
to  be  compared  with  known  fault-free  WUTs.  This  adaptive  diagnosis  process  is  guaranteed  to 
locate  any  fault  detected. 

Next,  we  describe  the  basic  implementation  of  the  various  BISTER  test  sessions  for  on-line  test¬ 
ing  the  different  types  of  routing  resources  and  CIPs  in  the  interconnect  network.  The  entire  set  of 
test  configurations  is  organized  as  a  sequence  of  five  test  sessions,  where  we  define  a  test  session 
as  a  group  of  test  phases  targeting  the  same  types  of  faults.  Table  4  summarizes  the  five  test  ses¬ 
sions  for  testing  interconnect  faults  in  FPGA.  Some  of  these  test  sessions  may  not  be  required  for 
some  FPGAs  (depending  on  the  architecture  of  the  interconnect  network).  Similarly,  the  number 
of  test  phases  associated  with  each  test  session  will  vary  due  to  the  specific  differences  in  arrange¬ 
ments  of  CIPs  as  well  as  in  the  bus  structures,  such  as  the  number  of  wires  in  each  bus. 


Table  4:  Summary  of  interconnect  BIST  test  sessions 


Session 

Target  Faults 

1 

shorts,  opens,  and  some  CIP  laults  m  global  busses 

2 

CIP  faults  in  xl  busses,  shorts  &  opens  in  local  routing 

3 

faults  in  crosspoint  CIPs  between  global  &  local  busses 

4 

multiplexer  CIP  faults 

5 

faults  in  crosspoint  CIPs  between  global  busses 

The  first  test  session  tests  all  the  global  busses  (such  as  those  busses  illustrated  in  Figure  27a)  for 
shorts  and  opens.  In  addition,  these  configurations  test  all  breakpoint  CIPs  along  the  global  busses 
for  stuck-off  faults.  The  BISTER  architecture  for  testing  vertical  busses  is  illustrated  in  Figure  33, 
where  the  bold  lines  represent  busses  under  test.  Two  PLBs  (denoted  by  T)  form  a  2?7-bit  counter- 
based  TPG.  The  lower  n-bits  are  used  to  drive  an  exhaustive  set  of  test  patterns  on  the  A  and  B 
WUT  busses.  The  upper  n-bits  are  used  to  drive  exhaustive  test  patterns  on  the  two  sets  of  adja¬ 
cent  busses  (denoted  by  the  dashed  lines).  The  complete  set  of  2n-bit  exhaustive  patterns  can  then 
detect  shorts  between  wire  segments  within  the  WUTs  as  well  as  between  the  adjacent  busses 
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B  A 


Figure  33.  BISTER  for  Vertical  Busses 

regardless  of  any  bus  rotations  within  any  bus,  sinee  the  2?7-bit  count  sequence  ensures  opposite 
logic  values  will  be  on  adjacent  wire  segments.  Two  BISTERs  are  placed  within  the  STAR  such 
that  the  A  WUTs  are  sandwiched  between  two  dashed  busses.  Multiple  test  phases  are  needed  to 
test  all  the  vertical  busses  within  the  V-STAR.  For  example,  in  the  first  test  phase  the  BISTER  is 
configured  as  shown  in  Figure  33,  while  during  the  second  test  phase,  the  busses  are  reversed  so 
that  the  busses  denoted  by  dashed  lines  are  under  test.  Subsequent  test  phases  have  the  WUTs 
moving  across  the  global  busses  associated  with  a  given  column  of  PLBs  until  all  vertical  busses 
have  been  tested.  A  similar  set  of  test  phases  are  used  to  test  the  horizontal  busses  in  the  H-STAR. 
The  ORA  (denoted  by  O)  compares  the  A  WUTs  and  B  WUTs  driven  by  the  same  TPG  cell.  The 
PEBs  denoted  by  S  are  swapper  cells  which  may  or  may  not  be  needed  depending  on  the  particu¬ 
lar  bus  and/or  PLB  architecture. 

In  test  session  2,  we  test  most  of  the  global-to-local  xl  crosspoint  CIPs  for  stuck-off  faults  and  xl 
breakpoint  CIPs  on  the  direct  busses  for  stuck-on  faults.  Note  that  xl  routing  resources  are  com¬ 
mon  to  all  FPGA  interconnect  architectures  and  provide  the  primary  routing  resources  for  local, 
low-fanout  nets.  The  test  phases  in  test  session  2  use  the  basic  testing  arrangement  shown  in 
Figure  34  where  the  busses  are  interconnected  via  crosspoint  CIPs  such  that  they  crisscross 
through  the  local  routing  resources  of  the  PFB,  instead  of  passing  straight  down  on  a  single  global 
bus.  As  illustrated  in  Figure  34,  one  set  of  WUTs  (grey  line)  goes  through  the  FUTs  of  the  PFB, 
while  the  other  set  of  WUTs  (dashed  line)  goes  through  the  bidirectional  buffers  such  that  the 
bidirectional  buffers  are  also  completely  tested  in  both  directions  during  this  test  session.  Break¬ 
point  CIPs  are  tested  for  stuck-on  faults  by  the  crisscross  routing  since  opposite  logic  values  will 
be  applied  to  both  sides  of  the  breakpoint  CIPs  as  illustrated  in  Figure  34.  Note  that  no  global  hor¬ 
izontal  routing  resources  are  used  in  this  V-STAR  BISTER  to  facilitate  interconnection  of  the  par¬ 
titioned  system  function  that  is  still  in  operations  outside  of  the  STARs.  As  in  the  case  of  test 
session  1,  there  are  two  BISTERs  contained  in  the  STAR.  Since  routing  through  the  local  inter¬ 
connect  resources  is  required  during  this  test  session,  local  routing  resources  can  also  be  tested 
during  this  test  session,  including  the  bidirectional  buffers  associated  with  each  PFB  which  must 
be  tested  in  both  directions.  Multiple  test  phases  are  typically  required  for  this  test  session  for 
both  the  V-STAR  and  the  H-STAR  as  a  result  of  the  different  types  and  numbers  of  crosspoint 
CIPs  on  these  busses. 
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Figure  34.  Example  of  Test  Phases  for  Local  Routing 

In  test  session  3,  we  test  the  crosspoint  CIPs  between  direct  busses  and  global  busses.  The 
BISTER  structure  used  for  these  test  phases  is  illustrated  in  Figure  35.  Because  of  the  long  spans 
of  these  global  busses  across  multiple  PLBs,  we  must  use  multiple  ORAs.  Test  patterns  are  sup¬ 
plied  on  the  busses  and  arrive  at  ORA  cells  through  the  crosspoint  CIPs  being  tested  in  these  con¬ 
figurations.  Note  that  we  have  already  tested  these  busses  for  shorts  (in  test  session  1)  and  now  we 
are  only  testing  for  crosspoint  CIPs  stuck-off. 


Figure  35.  Testing  Crosspoint  CIPs  between  Direct  and  Global  Busses 

In  test  session  4,  we  test  the  multiplexer  CIPs  at  the  input  to  the  PLB  for  stuck-on  and  stuck-off 
faults.  Signals  passing  through  the  PLB  will  detect  multiplexer  CIP  stuck-off  faults  for  those  CIPs 
selected  to  drive  the  PLB.  Stuck-on  faults  in  the  unselected  inputs  of  the  multiplexer  CIPs  are 
detected  by  supplying  the  opposite  logic  values  of  those  being  passed  to  the  PLB  on  the  selected 
inputs.  An  example  of  these  test  phases  is  shown  in  Figure  36,  where  the  signals  passing  through 
the  PLB  are  reordered  on  the  bus  to  the  MUX  CIP  in  order  to  supply  the  opposite  logic  value  to 
the  unselected  inputs  to  the  MUX  CIP.  In  addition  to  testing  the  local  routing  wire  segments,  cros¬ 
spoint  CIPs,  and  bidirectional  buffers  during  session  2,  it  is  possible  to  also  test  some  of  the  MUX 
CIPs  on  those  WUTs  that  pass  through  the  LUTs  of  the  PLB.  However,  test  session  2  is  perhaps 
the  most  complicated  and  difficult  to  implement  of  all  five  test  session  and,  as  a  result,  maintain¬ 
ing  the  MUX  CIP  test  phases  as  a  separate  test  session  has  advantages.  For  those  FPGAs  where 
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Figure  36.  Example  of  MUX  CIP  Test  Phases 

the  MUX  CIPs  are  independent  of  vertical  and  horizontal  global  routing  resources,  all  test  phases 
of  test  session  4  can  be  performed  with  either  the  V-STAR  or  the  H-STAR  alone. 


Both  the  V-STAR  and  H-STAR  are  needed  to  test  the  crosspoint  CIPs  between  the  global  busses 
in  the  FPGA.  This  is  because  vertical  global  routing  resources  passing  through  the  H-STAR  and 
horizontal  routing  resources  passing  through  the  V-STAR  are  reserved  for  interconnecting  the  par¬ 
titioned  working  areas  that  are  separated  by  the  STARs.  Since  testing  these  crosspoint  CIPs 
requires  both  horizontal  and  vertical  global  routing  resources,  the  testing  can  take  place  only  at 
the  intersection  of  the  H-STAR  and  V-STAR.  The  four  test  phases  for  testing  the  crosspoint  CIPs 
on  a  given  set  of  global-to-global  busses  are  illustrated  in  Figure  37.  Two  PLBs  residing  in  the  H- 
STAR  are  used  as  a  2n-bit  counter-based  TPG  with  one  n-bit  field  supplying  test  patterns  to  the 
activated  crosspoint  CIPs,  and  with  the  other  «-bit  field  providing  test  patterns  to  the  global  busses 
bordering  the  nonactive  crosspoint  CIPs.  In  this  way  the  activated  CIPs  are  tested  for  stuck-off 
faults,  while  the  nonactivated  CIPs  are  tested  for  stuck-on  faults.  In  Figure  37a  and  b,  the  upper 
half  of  crosspoint  CIPs  at  the  STAR  intersection  are  tested  for  stuck-off  faults  while  the  lower  half 
of  the  crosspoint  CIPs  are  tested  for  stuck-on  faults.  The  testing  roles  are  reversed  in  Figure  37c 
and  d.  Two  PLBs  in  the  V-STAR  are  used  as  ORAs  with  routing  resources  associated  with  the  2- 
by-2  array  of  PLBs  at  the  intersection  of  the  two  STARs  forming  the  set  of  CIPs  under  test  during 
this  test  session.  This  basic  arrangement  of  PLBs  is  dipped  and/or  rotated  in  order  to  test  the  four 
comers  of  the  FPGA. 
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Figure  37.  Testing  global  crosspoint  CIPs 


d)  phase  4 


The  on-line  interconnect  BIST  approach  was  implemented  for  the  ORCA  2C  and  2CA  series 
FPGAs.  The  number  of  test  phases  required  for  each  test  session  is  summarized  in  Table  5.  These 
five  test  sessions  perform  a  complete  test  of  all  routing  segments  and  their  interconnecting  CIPs 
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within  a  V-STAR  in  a  total  of  25  configurations  and  within  an  H-STAR  in  a  total  of  18  configura¬ 
tions.  This  number  is  less  than  the  number  of  test  phases  required  for  a  complete  on-line  PLB 
BIST  [3]  (104  for  the  ORCA  2C  series  and  136  for  the  ORCA  2CA  series  FPGAs).  However,  the 
39  total  test  phases  for  our  on-line  interconnect  BIST  do  not  provide  the  same  diagnostic  resolu¬ 
tion  as  the  PLB  testing  within  a  STAR.  Therefore,  we  have  developed  additional  diagnostic  test 
phases  for  interconnect  fault  location  and  identification  described  in  the  next  subsection. 


Table  5:  Summary  of  interconnect  BIST  test  sessions  and  phases 


Test 

Session 

Target  Faults 

No.  of 
VSTAR 
Phases 

No.  of 
HSTAR 
Phases 

1 

shorts,  opens,  and  some  CIP  faults  m  global  busses 

7 

7 

2 

CIP  faults  in  xl  busses,  shorts  &  opens  in  local  routing 

4 

4 

3 

faults  in  crosspoint  CIPs  between  global  &  local  busses 

3 

3 

4 

multiplexer  CIP  faults 

7 

0 

5 

faults  in  crosspoint  CIPs  between  global  busses 

^  1 

The  on-line  interconnect  test  sessions  and  their  associated  test  phases  were  implemented  and 
applied  to  three  known- faulty  ORCA  2C15A  FPGAs  from  Lucent  Technologies.  Two  of  the 
FPGAs  (Chip  1  and  Chip  2)  exhibited  routing  faults  consistent  with  the  results  in  [57],  while  the 
third  faulty  FPGA  (Chip  3)  was  found  to  contain  only  the  previously  described  logic  fault  (one 
faulty  PLB  with  faulty  FFs).  The  interconnect  testing  results  are  summarized  in  Table  6. 


Table  6:  Summary  of  Interconnect  BIST  Test  Results  on  Faulty  Chips 


Chip 

STAR 

STAR 

Position 

Test 

Session 

Test 

Phases 

1 

V-STAR 

4 

2 

1 

3 

3 

4 

1,2 

H-STAR 

5 

1 

5,6 

2 

1,3 

2 

V-STAR 

6 

1 

7 

2 

1,2 

H-STAR 

1 

2 

1,2,  3,  4 

3 

2 

H-STAR 

3 

1 

7 

3 

1 

3 

V-STAR 

9 

3 

2,3 

4 

4,  5,  6,  7 

H-STAR 

2 

3 

1,2,3 

While  these  particular  test  sessions  were  not  specifically  designed  for  diagnosis,  the  on-line  BIST 
configurations  do  provide  a  significant  level  of  diagnostic  resolution  that  is  much  higher  than  that 
obtained  in  our  original  off-line  routing  BIST  [57].  The  diagnostic  resolution  that  can  be  obtained 
directly  from  the  testing  results  is  illustrated  in  Figure  38  for  Chips  1  and  2.  While  the  exact  loca¬ 
tion  of  the  faulty  interconnect  cannot  be  determined  by  the  testing  results,  it  can  be  seen  from  the 
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Figure  38.  Testing  Results  from  Faulty  ORCA  2C15s 

failing  areas  of  the  faulty  FPGAs  in  Figure  38  that  the  suspect  regions  can  be  quickly  narrowed 
down  to  19  percent  of  the  total  chip  area  in  Chip  1  and  28  percent  in  Chip  2. 

4.7  Interconnect  Diagnosis 

Once  a  fault  is  detected,  we  must  identify  the  fault  and  its  location  to  a  sufficient  level  of  accuracy 
to  facilitate  efficient  reconfiguration  around  the  fault  for  fault  tolerant  applications.  To  achieve  the 
necessary  diagnostic  resolution,  we  use  a  number  of  techniques  described  in  this  section.  We 
begin  by  analyzing  the  results  of  the  on-line  BIST  to  assist  in  narrowing  the  search  space  of  the 
fault(s).  For  example,  when  a  STAR  contains  multiple  BISTERs,  a  failing  result  from  one  of  those 
BISTERs  provides  an  initial  target  for  the  application  of  additional  diagnostic  configurations. 
Since  all  WUTs  are  tested  twice  and  compared  to  two  other  sets  of  WUTs,  we  can  determine  in 
which  WUT  (A  or  B)  was  faulty  through  analysis  of  the  ORA  failure  indications.  This  is  illus¬ 
trated  in  Figure  39a  where  during  the  second  test  phase  the  middle  WUTs  are  swapped  between 
TPGs.  For  faults  in  only  one  set  of  WUTs,  we  can  identify  which  WUT  contains  faults  by  looking 
at  whether  the  failures  moved  to  the  other  ORA  when  the  set  of  WUTs  were  swapped  which  will 
uniquely  identify  the  set  of  WUTs  with  faults.  When  we  have  multiple  faulty  WUTs  that  prevent  a 
unique  identification  of  the  correct  sets  of  faulty  WUTs,  we  apply  additional  diagnostic  phases 
that  compare  the  suspected  sets  of  WUTs  with  other  sets  of  WUTs  (different  from  the  ones  ini¬ 
tially  used). 

To  identify  the  faulty  region  in  a  set  of  WUTs,  we  add  multiple  ORAs  along  the  run  and  reverse 
the  direction  of  the  test  pattern  flow  through  the  WUTs  during  a  second  test  phase.  This  is  illus¬ 
trated  in  Figure  39b.  Another  technique  is  to  subdivide  the  BISTER  into  smaller  BISTERs  and 
retest  to  identify  the  faulty  region  of  the  WUTs  as  shown  in  Figure  39c.  This  divide-and-conquer 
method  can  be  repeated  to  obtain  high  diagnostic  resolution. 


1st  test  2nd  test  1st  test  2nd  test  1st  test  2nd  test  1st  test  2nd  test 

a)  2-testing  WUTs  b)  adding  ORAs  c)  divide  &  conquer  d)  reroute  wire  segments 


Figure  39.  Diagnostic  Configurations 
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Finally,  to  identify  the  faulty  wire  in  a  set  of  WUTs,  we  reduee  the  set  of  WUTs  to  a  single  wire 
under  test,  as  illustrated  in  Figure  39d.  After  cycling  through  all  of  the  wires  that  made  up  the 
original  set  of  WUTs,  we  determine  the  faulty  segment  or  CIP  by  rerouting  to  eliminate  a  segment 
at  a  time  during  the  subsequent  tests.  An  alternative  to  identifying  the  faulty  wire  from  the  set  of 
WUTs  is  to  rotate  the  test  pattern  bits  (from  the  TPG)  on  the  wires  in  the  suspect  WUT  while 
keeping  the  test  pattern  assignments  constant  on  the  known  good  set  of  WUTs.  This  requires  inde¬ 
pendent  ORAs  for  each  pair  of  wires  being  compared  but  this  approach  aids  in  obtaining  good 
diagnostic  resolution  without  increasing  the  number  of  diagnostic  test  phases. 


The  definition  of  a  segment  depends  on  the  type  of  its  fault.  For  opens,  a  segment  is  a  wire 
bounded  by  two  CIPs  with  no  other  CIPs  in  between.  For  example,  in  Figure  40,  Bj  and  B2  are 
breakpoints,  and  A/  and  are  crosspoints.  The  heavy  line  denotes  a  signal  net  routed  through  Ay 
and  X2.  The  dashed  line  denote  segments  that  are  connected  to  the  same  signal  net,  although  they 
don’t  serve  to  connect  it  to  other  segments  (note  that  a  crosspoint  CIP  does  not  break  continuity 
along  the  horizontal  or  vertical  segments).  Each  one  of  the  three  segments  BjXj,  XjX2,  and  A252 
may  have  an  open  fault.  For  shorts,  however,  a  segment  is  a  wire  bounded  by  breakpoint  CIPs.  For 
example,  a  short  between  B jXj  and  another  segment  Z  will  affect  the  entire  segment  BjB2,  which 
will  be  treated  as  a  single  faulty  segment. 


^7 


Figure  40.  Wire  Segments  and  Faults 


Let  P  and  Q  be  two  faulty  segments.  To  distinguish  between  P  and  Q  being  both  open  and  P  and 
Q  being  shorted  to  each  other,  we  configure  a  TPG  to  generate  the  four  possible  patterns  (00,  01, 
10,  1 1)  for  the  two  suspects,  and  record  the  cases  that  fail.  Any  mismatch  for  the  00  or  1 1  patterns 
would  indicate  an  open,  while  any  mismatch  for  the  01  or  10  patterns  would  indicate  a  short.  In 
case  of  an  open,  we  may  initially  only  know  the  open  segment  between  two  breakpoints  CIPs. 
However,  we  can  diagnose  further  to  determine  which  internal  segment  is  open  by  using  the  strat¬ 
egy  illustrated  in  Figure  39d.  Here  we  reroute  portions  of  the  segment  based  on  the  crosspoint  CIP 
locations  along  that  segment  in  order  to  determine  the  relative  location(s)  of  the  open(s).  For 
example,  in  Figure  40  we  can  diagnose  each  one  of  the  three  segments  5yAy,  AyA2,  and  X2B2 
independently  by  the  TPG  to  ORA  connections.  In  this  example,  the  dotted  lines  would  extend 
beyond  the  breakpoint  CIPs  and  represent  two  different  configurations  of  the  BIST  structure  (one 
for  diagnosing  5yAy  and  the  other  for  A252),  while  the  third  configuration  would  be  represented 
by  the  bold  line. 

Application  of  the  additional  diagnostic  test  configurations  were  applied  to  both  FPGAs.  From 
these  diagnostic  configurations,  we  identified  a  single  routing  fault  in  Chip  1  at  the  intersection  of 
the  two  STARS.  Furthermore,  we  were  able  to  determine  that  the  fault  is  a  short  in  a  local  routing 
segment  in  row  10  column  8.  In  addition,  we  were  able  to  determine  that  the  faulty  segment  is  not 
shorted  to  another  interconnect  segment  but  is  possibly  shorted  to  either  the  logic  or  the  configu- 
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ration  memory.  This  is  an  important  finding  since  previous  work  in  FPGA  logic  and  interconnect 
testing  and  diagnosis  has  assumed  that  faults  are  either  in  the  logic  or  in  the  interconnect 
resources  such  that  the  two  domains  can  be  considered  independently.  However,  this  actual  manu¬ 
facturing  defect  indicates  that  the  FPGA  diagnosis  problem  is  much  more  complex  than  previ¬ 
ously  thought. 

It  is  obvious  from  the  BIST  results  for  Chip  2  that  there  are  multiple  routing  faults  (at  least  two 
faults).  Application  of  the  diagnostic  test  configurations  confirmed  that  there  are  indeed  two 
faults.  The  first  fault  is  a  short  in  a  local  routing  segment  in  row  1  column  12  (again  at  the  inter¬ 
section  of  the  two  failing  STARs).  Like  the  short  in  Chip  1,  the  diagnostic  tests  indicate  that  this 
short  in  Chip  2  is  also  shorted  to  either  the  logic  or  the  configuration  memory.  Therefore,  we  con¬ 
ducted  further  testing  to  identify  what  these  segments  were  actually  shorted  to,  in  other  words,  the 
other  half  of  the  short.  We  were  unable  to  conclusively  determine  a  short  to  either  logic  or  to  the 
configuration  memory,  but  we  found  another  possible  scenario  to  be  a  resistive  short  to  the  power 
bus.The  second  fault  in  Chip  2  is  in  the  horizontal  global  routing  resources  in  row  5  (specifically 
the  hx4s  bus  in  Figure  27)  and  does  not  effect  V-STAR  testing.  This  fault  is  a  short  between  three 
of  the  wires  in  the  four- wire  bus  and  the  short  is  located  between  columns  6  and  8. 

4.8  Fault  Injection  Emulation 

Since  it  is  difficult  to  obtain  actual  faulty  FPGA  (particularly  with  logic  faults  as  opposed  to  rout¬ 
ing  faults)  we  have  developed  the  capability  to  insert  faults  into  specific  PLBs  and  specific  CIPs 
via  reconfiguration  of  the  PLBs  and  CIPs.  This  provides  a  capability  similar  to  physical  fault 
insertion,  and  gives  us  the  ability  to  emulate  both  random  and  clustered  faults  within  the  FPGA. 
We  have  used  this  method  for  verifying  both  our  fault  detection  and  diagnosis  approach,  and  the 
fault  tolerance  mechanism. 

The  basic  idea  is  similar  to  the  fault  insertion  based  on  fault  masks  used  in  a  parallel  fault  simula¬ 
tion,  but  in  this  case  we  operate  on  the  configuration  bit  files  just  prior  to  each  download  of  the 
system  and  BISTER  tile  configurations.  In  parallel  fault  simulation,  each  bit  of  a  computer  word 
is  used  to  simulate  a  faulty  circuit  during  fault  simulation.  As  a  result,  a  16-bit  or  32-bit  word  can 
be  used  to  simulate  16  or  32  faulty  circuits  in  parallel.  This  requires  the  following  set  of  logical 
operations  be  performed  on  each  logic  value  at  each  node  in  the  circuit  netlist  during  parallel  fault 
simulation  [1]: 

Z  =  VF  +  FS  (6) 

where  ‘+’  is  the  logical  OR  operator  and  is  the  logical  AND  operation.  In  this  expression,  V 
represents  the  logic  value  at  a  given  node  and  would  be  contained  in  an  A-bit  computer  word  to 
represent  the  values  for  N  faulty  circuits  in  parallel.  The  fault  mask,  F,  contains  a  ‘  1  ’  in  each  bit 
position  for  which  a  faulty  circuit  being  simulated  contains  a  fault  at  the  given  node,  and  S  gives 
the  stuck-at  values  (either  stuck-at-0  or  stuck-at-1)  that  are  to  be  injected  for  those  faults  active  at 
that  node  in  the  parallel  fault  simulation.  Therefore,  the  fault  mask  F  acts  as  multiplexing  control 
values  which  let  the  normal  circuit  logic  value  of  V  pass  to  the  node  value  Z  in  the  case  of  those 
parallel  faulty  circuits  where  no  fault  exists  at  the  given  node;  otherwise  the  stuck-at  logic  value 
replaces  the  normal  circuit  logic  value  [1]. 
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Using  this  basic  idea  of  the  fault  mask,  our  fault  injection  emulator  performs  a  similar  operation 
on  the  configuration  file  to  be  downloaded  into  the  FPGA.  We  start  with  the  original  configuration 
file,  C,  to  be  downloaded  into  the  FPGA  to  perform  the  desired  system  function.  This  file  is  the 
automatically  generated  by  the  CAD  tools  that  support  the  partieular  SRAM-based  FPGA(s) 
being  used  in  the  system.  We  then  create  a  fault  mask  file,  F,  and  a  stuck-at  value  file,  S,  to  com¬ 
plete  the  fault  injeetion  operation  as  follows: 

D  =  C  F  +  F  S  (7) 

where  D  is  the  resultant  configuration  file  with  injected  faults  to  be  downloaded  into  the  FPGA. 
The  size  of  the  fault  mask,  F,  and  the  stuck-at- values  file,  S,  is  the  same  as  that  of  the  original  con¬ 
figuration  file,  C,  and  the  resultant  download  configuration  file,  D.  A  logie  1  in  any  given  bit  posi¬ 
tion  in  the  fault  mask  will  replaee  the  original  eonfiguration  data  bit  with  the  eorresponding  value 
specified  in  the  stuck-at  value  file  in  the  resultant  download  configuration  file. 

In  our  method,  the  generation  of  a  new  download  file  takes  place  using  the  operation  deseribed 
above  prior  to  eaeh  download  to  the  FPGA.  As  a  result,  faults  ean  be  changed  in  the  fault  mask 
and  stuck-at  value  files  at  any  time.  This  facilitates  the  emulation  of  permanent  faults  (by  main¬ 
taining  those  faults  in  the  fault  mask  and  stuck-at  value  files)  or  transient  faults  (by  removing 
those  faults  from  the  fault  mask  after  some  period  of  time).  In  addition,  there  ean  be  any  number 
of  faults  specified  in  the  fault  mask  with  the  location  of  the  faults  being  randomly  distributed  or 
elustered.  Since  the  fault  injection  operation  takes  plaee  prior  to  each  download,  the  fault  mask 
ean  be  modified  prior  to  each  download  to  add  or  subtract  faults  temporally. 

The  faults  that  can  be  emulated  by  this  method  include  stuck-at  faults  in  the  logic  resources  (LUT 
and  FFs)  or  funetional  faults  (such  as  a  FF  that  acts  as  a  latch  or  it  is  activated  on  the  opposite 
edge  of  the  eloek).  The  faults  that  can  be  emulated  in  the  programmable  intereonneet  network 
include  wires  shorted  or  open  in  combinations  that  can  be  made  by  the  appropriate  CIPs  turned  on 
or  off  in  the  faulty  mode  of  operation.  It  is  important  to  note  that  no  FPGA  logie  or  routing 
resourees  are  needed  to  emulate  these  faults  sinee  only  manipulation  of  the  configuration  memory 
bits  (via  manipulation  of  the  configuration  file)  is  required  to  emulate  the  faults.  However,  we 
have  encountered  a  few  limitations  with  this  method.  For  example,  the  LUTs  in  most  FPGAs  can 
also  be  eonfigured  to  operate  as  small  RAMs  (larger  RAMs  ean  be  constructed  from  multiple 
LUTs).  While  permanent  stuck-at  bits  in  the  LUT  can  be  emulated  with  our  technique,  we  cannot 
emulate  permanent  stuek-at  bits  when  the  LUT  is  in  a  RAM  mode  of  operation  beeause  the  sys¬ 
tem  function  can  overwrite  our  stuek-at  value  downloaded  into  the  FPGA  once  system  operation 
takes  over. 

The  difficulty  of  using  this  fault  injection  emulation  method  is  determining  the  configuration 
memory  bit  mapping  for  fault  emulation.  In  other  words,  what  does  a  given  bit  in  the  eonfigura¬ 
tion  file  control  within  the  FPGA  architecture?  FPGA  manufacturers  have  been  very  reluctant  to 
release  their  configuration  memory  bit  maps  on  the  grounds  that  it  may  reveal  proprietary  archi¬ 
tectural  information  to  competitors.  This  argument  turns  out  to  be  rather  weak  sinee  the  CAD 
tools  provided  by  the  FPGA  manufacturers  typically  contain  all  the  features  needed  to  reverse 
engineer  the  configuration  memory  bit  mapping.  For  example,  using  the  graphieal  architeetural 
editor  (such  as  EPIC  in  the  ORCA  Foundry  tool  suite),  one  ean  aetivate  any  given  CIP  in  the 
FPGA  and  subsequently  generate  the  configuration  file  to  determine  the  location  of  the  bit  con- 
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trolling  that  particular  CIP  in  the  FPGA.  While  this  is  a  somewhat  tedious  process,  the  regularity, 
symmetry,  and  repetitive  structure  of  the  FPGA  simplifies  the  process  by  allowing  the  determina¬ 
tion  of  a  given  eonfiguration  memory  bit  for  a  given  PLB  to  be  translated  to  any  PLB  via  the 
appropriate  offset  in  the  configuration  memory  address. 

Most  SRAM-based  FPGAs  have  their  configuration  memory  partitioned  into  vertical  slices  (also 
referred  to  as  frames),  each  with  a  unique  address.  Within  the  frame,  configuration  bits  repeat 
down  the  eolumn  to  eontrol  the  PLBs  and  routing  CIPs  within  that  frame.  Once  a  set  of  frames 
have  been  mapped  to  establish  the  FPGA  logic  and  routing  resources  that  the  bits  in  those  frames 
control,  the  subsequent  set  of  frames  for  the  next  eolumn  of  PLBs  will  have  identieal  bit  maps 
with  the  only  change  being  the  eorresponding  frame  address  offset.  The  exeeption  to  this  is  the 
inter-quadrant  routing  that  is  found  in  some  FPGAs  (this  contains  specialized  routing  resources 
for  global  funetions  sueh  as  clocks);  this  results  in  an  additional  offset,  depending  on  which  of  the 
four  quadrants  the  fault  is  to  be  injeeted. 

Once  the  bit  and  frame  locations  of  all  desired  faults  to  be  emulated  have  been  determined,  faults 
can  be  injeeted  in  these  loeations  by  aetivating  the  correct  fault  mask  bits  in  F  and  specifying  the 
associated  stuck-at  value  in  S.  Since  these  are  text  files,  they  ean  be  easily  modified  manually  or 
via  a  dedicated  program  that  allows  faults  to  be  added  or  removed  from  the  text  file.  In  our  fault 
injection  system,  we  create  a  text  file  that  list  faults  to  be  emulated  using  a  meaningful  naming 
convention.  As  many  faults  as  desired  can  be  listed  in  this  text  file.  A  speeialized  program  ealled 
Faulting  then  takes  this  information  and  converts  the  fault  location  data  (the  row,  column,  type, 
and  particular  element  data)  to  the  appropriate  bit  loeation  within  the  frames  of  the  fault  mask  file 
by  inserting  a  1  in  the  appropriate  location  for  that  fault.  Similarly,  the  appropriate  bit  in  the  stuck- 
at  value  file  is  manipulated  by  FaultGen  for  each  fault  in  the  fault  list.  This  process  continues  until 
all  faults  from  the  fault  list  have  been  entered  into  the  fault  mask  and  stuek-at  value  files.  Once  the 
fault  and  stuek-at  value  files  have  been  eonstrueted,  the  faults  ean  be  injected  into  the  FPGA  on 
the  next  download. 

The  fault  list  ean  be  changed  as  a  funetion  of  time  and  the  FPGA  can  be  reeonfigured  (by  re- 
downloading)  to  inject  the  new  set  of  faults.  This  allows  for  the  addition  of  faults  to  the  system  or 
the  removal  of  faults  for  transient  fault  emulation.  Programs  can  be  used  to  create  the  fault  list  for 
random  or  clustered  spatial  or  temporal  fault  distributions.  This  facilitates  eflheient  and  effective 
evaluation  of  testing  and  diagnosis  teehniques  as  well  as  fault  tolerant  approaehes.  For  demonstra¬ 
tions  of  our  BIST  and  BIST-based  diagnosis,  as  well  as  our  logic  and  interconnect  FT  techniques, 
we  have  ineluded  the  ability  to  interaetively  specify  the  faults  to  be  emulated  using  a  graphical 
user  interfaee  (GUI),  shown  in  Figure  41,  where  the  background  shows  the  20  by  20  array  of 
PLBs  for  an  ORCA  2CI5  FGPA.  This  GUI  allows  injection  or  removal  of  faults  at  any  time  while 
the  FPGA  is  operating.  A  row  and  eolumn  location  is  selected  by  elieking  on  an  area  of  the  PLB 
array. 
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Figure  41.  TREC  and  Fault  Injection  Emulation  GUI 
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5.  Fault-Tolerant  Techniques 

5.1  Basic  Features 

In  most  conventional  FT  methods,  faults  are  detected  within  the  working  part  of  the  system,  and 
they  must  be  located  and  bypassed  as  fast  as  possible  so  that  normal  system  operation  can  resume 
as  soon  as  possible.  For  our  roving  STARs  approach,  we  divide  the  FPGA  into  two  distinct  parts: 
the  STAR  areas  where  all  testing  takes  place,  and  the  working  area  where  the  system  function 
operates.  A  consequenee  of  our  roving  STARs  strategy  is  that  the  faults  we  detect  are  always  in  a 
STAR  area  of  the  FPGA,  and  thus  they  do  not  affect  the  working  area  of  the  FPGA.  So,  we  don’t 
need  to  interrupt  the  normal  operation  to  replace  the  faulty  resource  by  a  fault-free  one,  sinee  the 
former  is  not  performing  any  system  function  when  the  fault  is  detected.  Once  a  fault  is  detected, 
we  first  set  a  flag  to  notify  the  operating  system  that  a  rollback  may  be  required  if  it  is  determined 
that  mission-critical  data  may  have  been  corrupted  since  the  last  time  the  faulty  element  was 
tested.  Another  important  difference  from  conventional  techniques  is  that  fault  diagnosis  and  FT 
reconfiguration  do  not  have  severe  real-time  constraints.  Since  normal  system  operation  continues 
in  the  working  area,  we  can  allow  much  more  time  for  accurate  diagnosis  and  for  computing  any 
required  fault-bypassing  configurations,  compared  with  an  approach  in  which  normal  operation  is 
interrupted  for  diagnosis  and  reconfiguration. 

Fault  tolerance  is  performed  after  faults  are  detected  and  diagnosed  (located).  Reconfiguration  for 
logic  fault  tolerance  and  reconfiguration  for  interconnect  fault  tolerance  are  distinet  tasks.  First  we 
determine  if  the  system  function  can  continue  to  work  correctly  in  the  presence  of  the  located 
fault(s).  In  many  situations  this  is  possible  and  then  no  reconfiguration  is  needed.  If  a  fault  does 
affect  the  system  function,  we  determine  alternate  configurations  that  avoid  the  faulty  resource. 

Besides  being  applied  to  avoid  faults,  reconfiguration  can  be  used  to  reduce  the  performance  deg¬ 
radation  cause  by  fault  avoidanee.  In  other  FT  approaehes,  the  system  clock  speed  must  be  set  to 
accommodate  the  worst-case  FT  configuration  that  is  possible.  In  the  event  that  the  worst  case 
never  occurs,  the  system  is  penalized  with  a  slower  than  necessary  clock  speed.  We  have  intro¬ 
duced  the  concept  of  an  adaptive  system  clock  implemented  by  a  clock  generator  whose  program¬ 
mable  period  can  be  set  by  the  TREC  (via  the  control  interface  between  the  TREC  and  the 
FPGA).  The  initial  clock  frequency  is  set  to  the  maximum  value  allowed  in  the  fault-free  circuit. 
If  the  circuit  critical  path  changes  as  a  result  of  FT  reconfiguration,  TREC  adjusts  the  clock  rate 
based  on  post-routing  timing  analysis.  The  timing  analysis  is  run  inerementally  only  for  the  signal 
nets  affected  by  reconfiguration.  Without  such  an  adjustment  the  clock  must  be  slow  enough  to 
work  for  any  possible  FT  reconfiguration.  In  contrast,  adjusting  a  programmable  clock  period  will 
introduce  timing  penalties  only  when  required,  as  a  result  of  new  faults.  This  contributes  to  a  more 
graceful  system  degradation  as  faults  occur. 

In  the  following  sections,  we  first  review  related  research,  and  then  describe  the  details  of  our 
approach  to  logic  and  interconnect  fault  tolerance.  We  introduce  our  three-level  approaeh  that  we 
apply  during  both  logic  and  interconnect  fault  tolerance.  Then  in  later  subsections,  we  describe 
specific  details  and  unique  features  of  both  our  logic  and  interconnect  FT  approaches. 
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5.2  Related  Research 


5.2.1  Logic  Fault  Tolerance 

In  their  FT  technique,  Kumar  et  al.  used  adaptive  customization  to  avoid  defective  sections  and 
artificially  enhance  a  programmable  gate  array’s  yield  [30],  Their  technique  was  strictly  off-line 
and  fault  tolerance  was  achieved  by  bypassing  faulty  programmable  resources  at  the  time  the  cir¬ 
cuit  was  initially  mapped  to  the  FPGA.  They  described  a  two-step  process  to  achieve  defect  toler¬ 
ance:  test  of  an  unprogrammed  device  to  locate  defective  components  and  program  the  device  to 
avoid  the  defective  portions.  They  presented  heuristics  for  adaptively  programming  an  FPGA  in 
the  presence  of  faults. 

Hatori  et  al.  introduced  redundancy  as  a  FT  method  for  FPGAs  [23].  Their  method  used  special¬ 
ized  selector  circuitry  to  reconfigure  FPGA  circuits  in  the  presence  of  faults.  Similar  to  methods 
used  for  fault  tolerance  in  SRAMs,  an  additional  column  of  programmable  logic  blocks  is  added 
to  the  FPGA.  In  the  event  that  a  logic  block  was  faulty,  all  functions  mapped  to  the  column  where 
the  faulty  block  was  located  were  shifted  toward  the  spare  column.  All  subsequent  columns  of 
functions  between  the  column  with  the  faulty  block  and  the  spare  column  were  also  shifted  a  dis¬ 
tance  of  one  column  toward  the  spare  column.  Their  selector  circuits  required  approximately  2 
percent  additional  area  overhead,  and  they  required  spare  columns.  Since  they  eliminated  an 
entire  column  for  one  fault,  their  fault  tolerance  was  low,  and  they  estimated  a  performance  degra¬ 
dation  of  5  percent  when  compared  to  similar  non-FT  FPGAs. 

Durand  and  Piquet  proposed  an  FT  FPGA  architecture  with  the  ability  to  repair  itself  in  the  pres¬ 
ence  of  faults  [8].  They  used  a  special  multiplexer  with  self-repair  and  self-diagnosis  capabilities. 
Their  special  circuitry  was  capable  of  detecting  logic  failures  during  run-time,  and  the  architec¬ 
ture  had  a  limited  capability  to  reconfigure  itself  dynamically.  Their  FT  technique  allocated  two 
extra  columns  of  cells,  and  in  the  event  that  a  fault  was  detected,  they  bypassed  the  faulty  column 
by  shifting  toward  the  spare  column.  Since  the  authors  reported  that  more  than  complete  logic 
redundancy  was  required  for  their  method,  this  approach  results  in  a  large  area  overhead. 

Narasimhan  et  al.  developed  a  FT  technique  for  FPGAs  or  wafer-scale  integrated  arrays  [42,  43]. 
They  use  a  pebble  shift  algorithm  to  reconfigure  around  faulty  blocks.  Their  method  is  flexible.  It 
is  not  limited  to  one  fault  per  row,  column,  or  tile.  However,  it  must  be  performed  off-line,  and  it 
is  requires  the  application  cease  execution  for  reconfiguration. 

Howard  and  Tyrrell  described  concepts  for  increasing  yield  on  FPGAs  [24].  Their  FT  idea  was  to 
insert  extra  bypassing  columns  or  rows  into  the  FPGA.  They  looked  at  both  removing  an  entire 
row  (column)  if  it  contained  a  fault  and  removing  only  the  faulty  block  from  the  row.  Their  meth¬ 
ods  were  strictly  for  yield  enhancement  and  the  faults  were  bypassed  at  the  time  the  configuration 
for  the  FPGA  was  downloaded.  They  also  introduced  global  long  lines  to  reduce  the  performance 
hit  associated  with  reconfiguration. 

Kelly  and  Ivey  used  redundancy  to  detect  faults  in  applications  mapped  to  FPGAs  [29].  Their  FT 
technique  used  a  shift  method  to  reconfigure  in  the  presence  of  faults.  They  incorporated  a  recon¬ 
figuration  switch  to  reconfigure  and  used  normal  place  and  route  tools  for  mapping  circuits  to 
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FPGAs.  A  step-based  switch  configuration  algorithm  was  used  to  configure  the  special  switches 
around  faulty  elements. 

Cuddapah  and  Corba  used  Xilinx  SRAM-based  FPGAs  to  demonstrate  the  FT  capabilities  of 
FPGAs  [7].  In  their  study,  they  randomly  picked  logic  blocks  to  be  faulty.  Then  they  reconfigured 
the  circuit  around  these  faults  using  commercially  available  tools.  The  main  contributions  of  their 
work  were  an  algorithm  to  determine  fault  coverage  of  a  design  and  a  definition  of  the  fault  recov¬ 
ery  rate  for  any  given  design  implemented  in  an  SRAM-based  FPGAs.  Additionally,  they  demon¬ 
strated  that  fault  recovery  was  feasible  on  FPGAs  by  other  than  modular  redundant  methods. 

Dutt  and  Hanchek  et  al.  developed  a  method  to  increase  FPGA  yield  [9,  19].  Their  method  uses 
node  covering  and  reserved  routing  resources  to  replace  the  functionality  of  faulty  cells.  One  row 
(column)  of  cells  is  reserved  for  spares.  If  a  cell  in  any  given  column  (row)  goes  bad,  the  function¬ 
ality  of  all  cells  in  the  column  (row)  from  the  faulty  cell  to  the  spare  cell  is  shifted  toward  the 
spare  cell.  Spare  routing  resources  are  used  to  eliminate  overhead  of  rerouting  the  updated  circuit 
placement.  The  main  advantage  of  this  method  is  that  it  is  very  fast.  Since  the  spare  resources 
have  already  been  allocated  to  cover  a  limited  number  of  faults,  the  reconfiguration  time  is  linear 
with  respect  to  the  number  of  faults.  There  are  several  issues  with  this  methodology.  The  most  sig¬ 
nificant  problem  is  the  limitation  of  the  fault  coverage.  The  number  of  faulty  cells  that  can  be  cov¬ 
ered  is  limited  to  one  in  each  column  (row).  If  two  faults  were  found  in  the  same  row,  this  method 
breaks  down.  There  is  also  an  associated  resource  overhead.  For  an  A  by  A  array  of  cells,  at  least 
A  cells  must  be  left  unused.  There  is  also  a  routing  overhead  caused  by  the  reservation  of  routing 
resources.  Additionally,  when  cells  are  moved  there  is  an  associated  performance  degradation 
associated  with  increasing  the  length  and  number  of  programmable  interconnect  resources. 

Emmert  and  Bhatia  developed  a  FT  technique  for  incrementally  reconfiguring  FPGA  mapped  cir¬ 
cuits  around  faulty  programmable  resources  [11,  13].  They  used  mini-max  grid  matching,  to 
match  faulty  PLB  locations  to  unused  spare  PLB  resources.  Then  they  incorporated  a  shift  meth¬ 
odology  to  shift  PLB  logic  cell  functions  between  the  faulty  PLB  and  its  matched  spare  location 
toward  the  spare  location.  The  shift  method  reduced  the  performance  degradation  caused  by 
matching  logic  cell  functions  to  spares  that  were  spatially  separated  by  large  distances.  They 
applied  their  method  to  the  Xilinx  FPGA.  Later,  Lakamraju  and  Tessier  improved  on  the  idea  of 
shifting  logic  by  introducing  the  idea  of  shifting  logic  within  a  PLB  [33],  and  thus  they  reduced 
the  amount  of  reconfiguration  required  for  logic  fault  tolerance. 

Each  et  al.  developed  low-overhead  FT  systems  [31,  32].  Their  basic  approach  was  to  partition  a 
design  into  a  number  of  tiles.  Each  tile  was  allocated  a  number  of  spare  PLBs.  In  the  event  a  fault 
in  a  tile  was  found,  a  spare  PLB  was  used  to  replace  the  faulty  PLB.  Their  reconfiguration 
approach  provided  multiple  configurations  for  each  tile.  In  the  event  that  a  fault  was  detected,  the 
configuration  associated  with  the  fault  was  downloaded.  The  main  drawback  of  this  method  is  low 
fault  tolerance.  It  is  limited  to  one  fault  per  tile.  If  multiple  faults  occur  in  close  proximity  (within 
a  tile),  this  method  breaks  down.  There  is  no  ability  to  draw  spares  from  other  tiles  to  cover  sev¬ 
eral  faults  in  a  single  tile. 

With  the  exception  of  Lach  et  ah,  all  previous  methods  rely  on  reconfiguration  around  faults  to  be 
calculated  on-line.  This  requires  fast  computations  in  order  to  maintain  high  availability  for  the 
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FT  system.  The  preeompiled  eonfigurations  of  Lach  et  al.,  on  the  other  hand,  significantly  reduce 
system  down-time. 

5.2.2  Interconnect  Fault  Tolerance 

Many  papers  have  focused  on  fault  tolerance  for  FPGAs;  however,  most  previous  work  is  targeted 
primarily  toward  logic  fault  tolerance.  In  this  section  we  discuss  previous  research  in  the  area  of 
fault  tolerance  for  FPGAs  that  include  interconnect  fault  tolerance. 

Dutt  and  Hanchek  developed  an  off-line  method  to  increase  FPGA  yield  [9,  10].  Their  method 
uses  reserved,  spare  routing  resources.  Spare  routing  resources  are  used  to  cover  faulty  intercon¬ 
nect  resources.  Spare  interconnect  resources  are  allocated  as  a  grid,  and  when  a  fault  occurs,  it  is 
replaced  by  an  extra  segment  or  grid  of  segments.  This  method  is  very  fast,  but  is  limited  in  the 
fault  patterns  that  it  can  tolerate;  for  example,  it  cannot  tolerate  a  cluster  of  faults  in  the  same  local 
area.  There  is  also  an  overhead  caused  by  the  reservation  of  routing  resources. 

Mahapatra  and  Dutt  proposed  a  dynamic  method  for  allocating  interconnect  resources  to  bypass 
faults  only  after  faults  have  been  located  [36].  If  the  new  required  segments  conflict  with  the  cur¬ 
rent  usage  of  routing  tracks,  the  layout  is  incrementally  modified  to  make  room  for  the  new  seg¬ 
ments. 

Incremental  routing  for  logic  and  interconnect  FT  was  first  introduced  by  Emmert  and  Bhatia  [12, 
13].  Their  incremental  router  used  the  read/ write  capability  of  the  FPGA  configuration  memory  to 
ripup  and  reroute  without  a  netlist.  Later  Dutt  et  al.  presented  similar  ideas  on  incremental  routing 
[10]. 

Roy  and  Nag  developed  a  FT  method  for  interconnect  faults  [47].  They  introduced  the  concept  of 
compatibility  for  interconnect  faults,  and  demonstrated  certain  cases  where  single  interconnect 
faults  could  be  handled  independently.  However,  in  [47]  they  do  not  handle  multiple  interconnect 
faults. 

In  [32],  Lach  et  al.  extend  their  logic  tiling  method  to  consider  interconnect  faults.  Similar  to  their 
method  for  handling  logic  faults,  several  replacement  configurations  are  precomputed  for  pro¬ 
grammable  signals  that  do  not  extend  beyond  the  tile,  so  that  each  configuration  leaves  some  rout¬ 
ing  resources  unused.  Then  when  a  fault  occurs,  they  try  to  find  a  configuration  that  does  not  use 
the  faulty  resource;  this  method,  however,  is  not  able  to  tolerate  all  possible  faults  or  a  group  of 
faults  in  the  same  tile.  For  signals  that  do  extend  between  tiles,  they  propose  two  alternatives.  In 
some  cases  the  tile  size  can  be  enlarged  to  encompass  the  signals.  This  works  well  for  hierarchical 
architectures  like  CPLDs  and  for  dedicated,  linked  interconnect  paths  like  fast  carries.  For  signals 
that  are  routed  through  interconnects  that  extend  across  several  tiles,  they  leave  a  long  segment 
vacant,  to  be  used  to  replace  any  parallel  segment  in  the  same  area.  But  after  one  faulty  segment  is 
replaced  by  the  long  spare,  no  additional  faults  can  be  tolerated  in  that  region. 

Our  approach  overcomes  the  main  limitations  of  previous  work  -  the  need  to  stop  the  system 
function  to  allow  for  diagnosis  and  computing  fault-bypassing  reconfigurations,  and  the  inability 
to  tolerate  several  interconnect  faults  in  the  same  region.  Our  approach  is  the  only  one  integrated 
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within  a  system  that  provides  a  eomplete  test  and  diagnosis  strategy.  Additional  features  will  be 
presented  in  the  next  sections. 

5.3  Multilevel  Fault  Tolerance  Approach 

A  major  advantage  of  our  reconfiguration-based  approach  is  that  we  can  tolerate  a  large  number 
of  both  logic  and  interconnect  faults.  In  this  subsection  we  describe  our  three-level  approach. 

Level  1  consists  of  leaving  a  STAR  (or  possibly  both  STARs)  parked  where  faults  have  been 
detected.  While  the  STAR  is  parked  over  the  faults,  system  operation  goes  on  without  interruption 
because  the  faults  are  not  located  in  the  working  area.  STAR  parking  —  done  for  both  PLB  faults 
and  interconnect  faults  —  continues  while  the  faults  are  diagnosed  by  the  TREC,  possibly  using 
additional  configurations  for  diagnosis  in  the  parked  STAR. 

The  detected  faults  may  affect  the  next  slice  of  the  working  logic  to  be  relocated  in  the  parked 
STAR  position.  The  currently  unused  logic  and  routing  resources  in  the  working  area  are  consid¬ 
ered  spares  to  be  used  to  bypass  faults.  The  first  question  is  whether  the  application  logic  is 
mapped  such  that  after  relocation  it  would  fall  on  the  newly  diagnosed  faulty  resources.  If  the 
faults  affect  only  spare  resources,  then  no  action  is  required.  Otherwise,  the  next  question  is 
whether  the  faulty  resources  are  usable  in  the  context  of  the  functionality  required  by  the  system 
function.  The  next  subsections  will  discuss  usability  in  more  detail.  If  the  fault  is  not  compatible 
with  the  current  configuration,  then  our  FT  methods  incrementally  reconfigure  the  working  area  to 
avoid  the  faults  before  the  working  area  is  relocated  in  the  next  step  of  the  STAR  roving  process. 
Some  fault-bypassing  rovmg  configurations  (FABRICs)  can  be  precompiled,  but  most  faults  will 
require  new  FABRICS  to  be  computed  on-line;  this  computation  also  takes  place  while  the  STAR 
is  parked  over  the  faults. 

A  major  advantage  of  STAR  parking  is  that  testing  can  continue  while  a  STAR  is  parked.  Let  us 
assume  that  H-STAR  is  the  parked  STAR.  After  the  faults  in  H-STAR  have  been  located,  and 
while  TREC  is  computing  FABRICS,  V-STAR  can  continue  roving  and  testing.  This  process  will 
still  test  all  PLBs  and  all  vertical  interconnect,  but  the  horizontal  interconnect  resources  usually 
tested  by  the  now-parked  H-STAR  will  not  be  tested  during  this  period.  Thus,  a  partial  testing  pro¬ 
cess  is  active  even  when  one  STAR  is  parked. 

Level  2  occurs  when  we  move  the  parked  STAR  off  of  the  faults  and  it  restarts  roving.  During 
Level  2  we  apply  the  precompiled  or  newly  computed  FABRICS.  By  using  the  partial  reconfigura¬ 
tion  capability  of  FPGAs,  the  size  of  these  FABRICs  is  kept  small  relative  to  the  memory  required 
for  storing  complete  FPGA  configurations.  A  FABRIC  replaces  a  faulty  resource  with  a  spare 
one.  Ideally,  we  would  like  to  always  have  a  spare  resource  in  the  neighborhood  of  the  fault.  The 
next  subsections  will  discuss  spare  allocation  strategies  for  PLBs  and  interconnect  that  attempt  to 
achieve  this  goal. 

Level  3  is  entered  as  the  last  recourse  when  all  the  spare  PLBs  or  spare  interconnect  resources  in 
the  working  area  have  been  used  for  fault  tolerance.  Then  we  use  some  of  the  roving  spares  pro¬ 
vided  by  the  two  STARs  to  bypass  faults.  This  level  is  referred  to  as  STAR  stealing.  By  making 
use  of  reconfiguration  and  alternately  stealing  from  one  STAR  and  then  from  the  other,  it  may  be 
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possible  to  continue  roving  with  both  STARs,  one  at  a  time.  In  this  case  we  lose  the  ability  to  test 
global  crosspoint  CIPs.  We  will  try  to  maintain  both  STARs  roving  one  at  a  time  for  as  long  as 
possible.  In  the  event  that  one  STAR  is  completely  used  up  (for  spares)  and  we  start  stealing  from 
the  other  STAR,  we  can  rove  with  only  a  partial  STAR.  Note  that  the  resources  taken  from  the 
STAR  will  no  longer  be  spare.  When  this  occurs,  the  extent  of  testing  will  be  reduced  with  every 
new  fault  since  some  tiles  will  no  longer  contain  BISTERs,  not  enough  fault-free  PLBs  may  be 
available  to  allow  accurate  diagnosis,  and  more  interconnect  resources  will  no  longer  be  testable. 


5.4  Fault  Tolerance  for  PLBs 

Because  of  the  roving  process,  the  system  logic  does  not  have  a  fixed  placement.  Figure  42  illus¬ 
trates  how  the  same  physical  PLB  is  time  shared  among  four  different  logic  cell  functions, 
depending  on  the  positions  of  the  STARs.  (A  logic  cell  function  is  the  function  mapped  to  one 
PLB.)  Our  FT  techniques  correctly  deal  with  this  dynamic  behavior.  Whenever  possible  we  reuse 
faulty  PLBs;  this  is  a  drastic  change  from  conventional  FT  techniques  that  throw  out  faulty 
resources.  This  technique  allows  us  to  handle  more  logic  faults  than  there  are  available  spares.  If 
reuse  is  not  possible,  we  compute  FABRICS.  When  reconfiguration  is  required,  we  have  devel¬ 
oped  optional  spare  logic  preallocation  strategies  to  decrease  the  time  required  for  reconfiguration 
and  decrease  the  impact  on  the  size  of  the  incremental  configuration  files.  Another  major  advan¬ 
tage  of  our  technique  over  conventional  techniques  is  that  we  can  handle  groups  of  faults  in  a  tight 
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Figure  42.  Example  of  Four  Logic  Cells  Time  Sharing  One  PLB 
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area.  Other  teehniques  limit  the  number  of  faults  they  ean  handle  by  providing  a  limited  number 
of  spare  logic  resources  in  a  subarea  or  tile  of  the  FPGA.  When  these  local  spares  are  exhausted, 
their  FT  techniques  fail.  In  addition  to  our  local  spare  allocation  strategies,  we  also  provide  a  glo¬ 
bal  reconfiguration  approach  that  makes  use  of  mini-max  matching  to  allow  a  large  number  of 
logic  faults  in  the  same  area  to  be  tolerated. 

5.4.1  Reusing  Faulty  PLBs 

PLBs  are  usually  composed  of  a  set  of  LUTs  and  FFs.  For  example,  if  one  FF  is  faulty,  that  FF 
can’t  be  used  by  the  system  function,  but  the  LUTs  and  the  other  FFs  may  be  used.  As  another 
example,  consider  a  LUT  with  a  memory  cell  stuck  at  0.  If  the  faulty  LUT  implements  a  logic 
function  where  the  faulty  memory  cell  is  set  to  0,  then  the  fault  is  tolerated  without  requiring  any 
reconfiguration  and  any  loss  of  system  resources.  If  the  needed  value  and  the  stuck  value  are  com¬ 
plementary,  or  if  the  faulty  LUT  is  used  as  a  RAM,  then  the  fault  is  not  compatible.  In  this  case 
there  are  two  choices.  We  can  reconfigure  the  system  function  to  completely  avoid  the  faulty  LUT, 
or  we  can  attempt  to  find  a  system  logic  function  that  can  make  use  of  the  faulty  memory  cell  in 
its  stuck-at  condition.  This  still  requires  reconfiguration,  but  we  have  eliminated  the  requirement 
of  a  spare  resource  for  the  fault. 

If  the  faulty  LUT/RAM  has  an  unused  input/address  bit,  another  way  to  tolerate  the  stuck  cell  is  to 
simply  use  the  half  of  the  LUT  that  does  not  contain  the  faulty  memory  cell.  For  example,  if  we 
have  a  4-input  LUT/RAM  with  16  memory  cells,  and  the  PLB  is  using  only  3  inputs,  we  can  still 
use  half  of  the  memory  cells  to  implement  a  3-input  function  or  a  RAM  with  3  address  bits. 

With  both  of  the  methods  described  above  we  increase  the  fault  tolerance  and  the  lifetime  of  the 
system.  The  usability  of  a  defective  LUT  has  to  be  analyzed  separately  for  each  one  of  the  four 
logic  cell  functions  of  the  system  function  that  may  be  relocated  over  it. 

As  stated  in  the  previous  subsection,  when  a  faulty  programmable  logic  resource  (a  LUT  or  a  FF) 
is  not  usable,  there  is  still  a  potential  for  a  PUB,  by  using  the  nonfaulty  part  of  the  PLB.  If  the 
PUB  contains  a  logic  cell  function  that  does  not  use  the  faulty  resource,  it  can  be  used  as  is  with¬ 
out  any  reconfiguration.  For  example,  the  fault  may  be  in  a  section  of  the  PLB  that  is  not  used  in 
the  relocating  logic  cell  function,  such  as  in  an  unused  LUT  or  FF,  or  it  may  affect  a  LUT  opera¬ 
tion  not  used  in  the  relocating  logic  cell  function,  such  as  multiplication.  Sometimes  a  PUB 
requires  local  reconfiguration  of  a  logic  cell  function  [33].  For  example,  the  fault  may  affect  one 
of  the  LUTs,  but  the  relocating  logic  cell  function  has  another  LUT  currently  unused  that  can  be 
considered  as  a  spare  and  connected  to  replace  the  faulty  LUT.  The  usability  of  a  defective  PLB  as 
a  PUB  also  must  be  analyzed  separately  for  each  one  of  the  four  logic  cell  functions  of  the  system 
logic  that  may  be  relocated  over  it. 

We  regard  usability  as  a  compatibility  relation  between  a  fault  in  a  PLB  and  a  desired  function  for 
that  PLB:  if  the  faulty  PLB  can  correctly  perform  the  function,  we  say  that  the  fault  is  compatible 
with  the  function.  Any  fault  is  compatible  with  the  “empty  function”  of  a  spare  PLB.  A  fault  that 
affects  only  sections  of  a  PLB  that  are  not  used  in  the  desired  function  is  also  compatible  with  that 
function.  A  defective  PLB  with  unused  parts  can  be  made  compatible  with  a  desired  function  by 
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remapping.  For  uniformity,  we  eonsider  a  fault- free  PLB  as  having  an  “empty  fault,”  which  is  then 
compatible  with  any  function. 


To  show  the  effect  of  PUBs  when  reconfiguring  for  logic  fault  tolerance,  we  used  several  bench¬ 
mark  circuits.  We  randomly  picked  a  number  of  PLBs  (and  a  LUT  or  FF  within  each  PLB)  to  be 
faulty.  Then  we  reconfigured  by  moving  any  logic  cell  function  that  was  not  compatible  with  a 
faulty  PLB  to  a  new  location.  We  increased  the  number  of  faults  until  we  could  no  longer  recon¬ 
figure  the  circuit.  In  Figure  43  we  see  representative  test  results  for  reconfiguring  with  and  with¬ 
out  using  PUBs.  The  data  for  the  Digital  Single  Sideband  Modulator  (DSSM)  circuit  show  that 


DSSM  -  Faults  Added  Until  Failure 


^PUBs  ---No  PUBs 
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Figure  43.  Number  of  Faults  Tolerated  with  and  without  PUBs 

the  number  of  logic  faults  that  can  be  tolerated  increases  with  the  use  of  PUBs.  They-axis  shows 
the  average  distance  that  an  incompatible  logic  cell  function  was  moved  to  be  placed  on  a  compat¬ 
ible  spare  PLB  or  PUB.  The  x-axis  shows  the  number  of  faulty  PLBs  injected  into  the  circuit. 
Without  the  use  of  PUBs,  reconfiguration  halts  when  the  number  of  faulty  PLBs  exceeds  the  num¬ 
ber  of  spares  in  the  system  working  area.  With  PUBs,  the  number  of  faulty  PLBs  actually  exceeds 
the  number  of  spare  PLBs  in  the  working  area. 

5.4.2  Bypassing  Incompatible  Faults 

If  a  defect  is  not  compatible  with  the  system  function,  the  system  function  must  be  reconfigured  to 
avoid  the  incompatible  fault.  One  approach  is  to  move  the  entire  logic  cell  function  to  a  compati¬ 
ble  spare  PLB  location.  Note  that  a  spare  PLB  may  not  be  fault  free,  since  we  allow  a  spare  PLB 
to  be  relocated  over  a  defective  one.  The  condition  for  replacement  is  to  find  a  spare  PLB  compat¬ 
ible  with  the  desired  function  (this  also  includes  the  fault-free  spare  case).  The  actual  replacement 
takes  place  in  the  next  roving  step,  where  the  reconfiguration  makes  the  relocating  logic  cell  func¬ 
tion  fall  on  a  compatible  spare  instead  of  the  incompatible  defective  PLB.  Having  the  compatible 
spare  in  the  neighborhood  of  the  defective  PLB  helps  minimize  the  extent  of  the  changes  in  the 
layout  of  the  system  function,  including  changes  in  the  delays  of  the  signals  that  have  to  be 
rerouted. 
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Moving  a  logic  cell  function  to  a  compatible  spare  loeation  will  work  as  long  as  1)  a  compatible 
spare  is  available  and  2)  no  two  logic  cell  functions  need  the  same  spare.  For  the  second  case,  we 
use  a  grid  matehing  teehnique  [1 1]  to  perform  reconfiguration  and  use  other  spares  available  in 
the  working  area  of  the  FPGA.  When  no  more  spares  are  available  in  the  working  area,  we  move 
to  STAR  stealing. 

FABRICS  for  single  faults  can  be  precompiled  and  stored  to  quickly  map  around  defective  PLBs. 
But  in  a  system  where  some  faults  have  been  bypassed,  the  initially  computed  FABRICS  may  no 
longer  be  valid;  so  after  one  FABRIC  is  applied,  TREC  updates  the  other  FABRICS  that  may  have 
been  afifeeted  by  the  incremental  reconfiguration.  The  initial  FABRICS  are  preeomputed  under  the 
assumption  of  a  single  fault,  so  when  multiple  faults  occur  in  the  same  region,  TREC  has  to  eom- 
pute  a  new  FABRIC  that  concurrently  bypasses  all  the  faults.  This  takes  longer  than  using  the  pre¬ 
compiled  FABRICS,  but  it  does  not  interfere  with  the  system  function  execution  since  a  STAR  is 
currently  parked  over  the  faults.  After  several  faults  have  oceurred  in  the  same  area,  it  is  possible 
that  all  the  spare  PLBs  have  been  used  to  replace  defective  PLBs,  so  it  may  become  necessary  to 
remap  the  entire  system  function  and  reallocate  the  remaining  spares  to  achieve  a  more  uniform 
spare  distribution.  Processing  for  this  is  done  off-line  by  the  TREC,  and  it  requires  the  entire  sys¬ 
tem  configuration  file  be  reloaded  during  the  next  rove. 

5.4.3  Preallocating  Spare  Resources 

We  deseribe  two  optional  spare  PLB  alloeation  strategies  for  the  working  area  of  the  FPGA,  but 
we  are  not  limited  to  just  using  these.  In  the  first  strategy,  we  guarantee  that  for  working  area  PLB 
utilization  less  than  80  pereent  (which  is  typical  for  most  applications),  every  system  logie  eell 
funetion  will  be  adjacent  to  at  least  one  spare  PLB.  For  the  second  strategy,  for  a  working  area 
PLB  utilization  less  than  92  percent,  every  system  logic  cell  function  is  no  further  than  one  PLB 
from  a  spare. 

Initial  placement  and  routing  of  the  system  logic  is  done  in  the  {N-2)  by  (A^-2)  working  area  of  the 
FPGA.  For  our  first  strategy,  we  will  constrain  the  design  so  that  at  least  20  percent  of  the  working 
area  PLBs  are  left  spares.  This  is  not  excessive,  sinee  typieal  PLB  utilization  is  less  than  80  per¬ 
cent  for  most  applieations.  Higher  utilizations  make  the  design  dififieult  to  route.  We  can  show  that 
if  this  constraint  is  satisfied,  we  can  always  find  a  placement  of  the  system  logic  so  that  every  sys¬ 
tem  logic  cell  function  is  adjacent  to  at  least  one  spare  PLB  in  the  working  area  of  the  FPGA. 
Figure  44a  illustrates  such  a  placement  for  a  12  by  12  FPGA.  To  foree  standard  design  tools  to 
generate  evenly  distributed  spares,  we  reserve  the  location  of  spare  PLBs  by  preplacing  dummy 
logic  cell  funetions  in  spare  PLBs  before  exeeuting  the  standard  placement  algorithm.  In  our 
implementation,  for  eaeh  system  logic  cell  funetion,  we  select  one  of  its  adjaeent  spares  as  its  pre¬ 
ferred  replacement.  Note  that  the  same  spare  may  be  designated  as  the  preferred  replacement  for 
several  working  logie  cell  funetions.  In  Figure  44,  the  following  equations  were  used  for  deter¬ 
mining  if  a  position  with  working  area  coordinates  {r,c)  should  be  reserved  as  a  spare.  If  we  let 
5=r(mod  5)  and  if  c(mod  5)  =  (25  +  l)(mod  5),  then  working  area  location  r,c  is  a  spare  location. 
Additional  spares  are  needed  on  the  edge  of  the  working  area  (labeled  with  e  in  Figure  44)  to 
ensure  that  fringe  PLBs  are  sufficiently  close  to  a  matching  spare  PLB  in  order  to  meet  the  defined 
distance  criteria.  Figure  44b  demonstrates  the  same  concepts  for  our  second  strategy.  Note  that  we 
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a)  each  logic  cell  function  is  adjacent  to  a  spare  b)  each  logic  cell  function  is  no  more  than  1  PLB  from  a  spare 

Figure  44.  Spare  Cell  Allocation  Patterns 


don’t  expect  the  PLB  utilization  to  be  so  high  in  the  initial  design,  but  only  after  a  number  of 
defective  PLBs  have  been  replaced  by  spares. 

Table  7  shows  the  performance  penalty  for  the  benchmark  circuits  described  earlier  when  imple¬ 
mented  in  an  ORCA  2C  series  FPGA.  The  column  labeled  “System  w/STARs”  gives  the  worst- 
case  performance  for  the  benchmark  circuit  with  STARs  inserted  and  no  preallocated  spare  PLBs. 
The  column  labeled  “System  w/STARs  and  FT  spares”  shows  the  worst-case  operating  frequency 
for  the  system  function  with  preallocated  spares  for  each  of  the  10  STAR  positions.  The  last  col¬ 
umn  shows  the  percent  difference  in  operating  frequency  between  the  circuits  without  and  with 
preallocated  spares.  For  the  circuits  tested,  the  percent  difference  ranges  between  2.5  and  15.1 
percent.  It  should  be  noted  that  preallocated  spares  are  not  a  requirement  of  our  FT  method;  how¬ 
ever,  they  reduce  the  time  for  incremental  reconfiguration  for  low  numbers  of  faults. 

Table  7.  Performance  Penalty  with  Spare  PLBs 


Circuit 

System  w/ 
STARS 

System  w/STARs 
and  FT  spares 

%  diff 

Huffman 

98.9 

116.6 

15.1 

Fibonacci 

130.8 

136.0 

3.8 

Wallace  Tree  Multiplier 

149.7 

166.8 

10.3 

Digital  Single  Sideband  Modulator 

76.6 

83.3 

7.2 

Hilbert 

87.5 

90.6 

3.4 

Random  Num  Generator 

56.4 

60.1 

6.2 

Mono-FFT 

120.8 

123.9 

2.5 

5.4.4  Multiple  Logic  Faults 

When  two  logic  cell  functions  are  assigned  the  same  spare  PLB  location,  there  is  a  conflict  if  they 
both  need  to  be  remapped  to  a  spare  location.  When  this  occurs,  we  use  a  matching  strategy  to 
match  logic  cell  functions  to  compatible  spare  locations.  Minimax  matching  matches  multiple 
logic  cell  functions  (logic  cell  functions  currently  mapped  to  incompatible  PLB  locations)  to  com¬ 
patible  spare  locations  such  that  the  maximum  distance  between  any  logic  cell  function  and  its 
corresponding  spare  location  is  minimized.  We  take  this  approach  to  reduce  the  amount  of  recon- 
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figuration  required  when  moving  logie  eell  functions  to  spare  locations  and  to  reduce  any  adverse 
effect  on  system  operating  speed. 

5.5  Fault  Tolerance  for  Interconnect 

Because  of  the  roving  process,  the  system  interconnect  is  not  fixed.  Similar  to  system  logic,  pro¬ 
grammable  routing  resources  used  to  connect  signal  nets  may  vary  depending  on  the  position  of 
the  STARS.  We  have  found  that  for  a  large  number  of  interconnect  faults,  the  system  function  does 
not  need  to  be  reconfigured.  A  large  part  of  our  interconnect  fault  tolerance  approach  deals  with 
determining  whether  an  interconnect  fault  is  compatible  with  the  layout  of  the  system  function. 
We  only  reconfigure  the  system  function  to  tolerate  interconnect  faults  when  the  layout  is  incom¬ 
patible  with  the  faults.  Below  we  describe  the  process  of  determining  and  reusing  faulty  intercon¬ 
nect  resources.  For  the  case  when  we  have  an  incompatible  layout,  we  describe  our  multistep 
approach  for  incremental  system  configuration  to  avoid  the  incompatible  faults.  We  also  describe 
an  optional  strategy  for  reserving  spare  programmable  interconnect  resources. 

5.5.1  Reusing  Defective  Interconnect 

Reusing  defective  interconnect  is  a  significant  departure  from  the  way  most  conventional  FT  tech¬ 
niques  work,  which  is  to  bypass  the  faulty  interconnect  resources.  In  our  approach,  we  allow  a 
signal  net  to  be  routed  through  faulty  interconnect  resources  (segments  and  CIPs)  whenever  possi¬ 
ble.  This  increases  the  effective  circuit  routability  and  leads  to  more  graceful  degradation  and 
longer  mission  life. 

We  define  a  layout  of  a  programmable  interconnect  network  as  the  set  of  positions  (open  or 
closed)  for  all  its  CIPs,  needed  for  a  specific  routing  of  the  netlist.  Similar  to  [47],  we  say  that  an 
interconnect  fault /is  compatible  with  a  layout  L  (f~  L)  if  and  only  if f  does  not  change  the  con¬ 
nections  implemented  by  L.  If /~  L,  then  L  can  tolerate / and  we  don't  have  to  reconfigure  to 
bypass  f. 

For  example,  in  Figure  45,  assume  a  layout  L  where  breakpoint  CIPs  Bj  and  B2  are  open,  cross- 
point  CIPs  X]  and  X2  are  closed,  and  segment  Z  is  not  used.  Then  an  open  on  is  compatible 
with  L,  while  an  open  on  X 2X2  is  not.  More  interestingly,  the  short  between  B^B 2  and  Z  is  also 
compatible  with  L,  since  Z  does  not  carry  any  signal.  Thus  the  routed  signal  net  can  use  the 
shorted  segment  B1B2.  Table  8  gives  detailed  conditions  that  need  to  be  met  for  single-fault  com¬ 
patibility. 


Figure  45.  Illustration  for  Fault  Compatibility 


Note  that  the  compatibility  relation  between  a  fault / and  a  layout  L  may  be  affected  by  the  pres¬ 
ence  of  another  fault  g.  This  happens  because  g  changes  the  structure  of  the  programmable  net- 
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Table  8.  Single-Fault  Compatibility  Conditions 


/ 

Condition  for  f^L 

segment  open  or 

CIP  stuek-open  or 
segment  stuek-at 

L  does  not  transmit  values 
through  the  faulty  resouree 

CIP  stuek-elosed 

L  uses  the  CIP  or 

L  drives  values  only  on  at  most 
one  of  the  shorted  segments 

short  between 
segments 

L  drives  values  only  on  at  most 
one  of  the  shorted  segments  or 

L  uses  the  shorted  segments  for 
the  same  signal 

work  on  which  L  is  implemented,  mis  interaction  between  taults  is  not  considered  in  [47],  For 
example,  in  Figure  46,  assume  a  layout  L  where  crosspoint  CIPsX;  andX2  and  are  closed,  and 
breakpoint  CIPs  and  B2  are  open.  Segment  ^4  is  used  by  signal  Sy  via  crosspoint  Xy,  segment  C 


is  used  by  signal  S2  via  crosspoint  X2,  and  segment  B  is  not  used.  Let  /be  the  short  between  seg- 

/  g 


X, 


B, 


By 


Xy 


Figure  46.  Multiple  Fault  Interaction 


ments  A  and  B,  and  let  g  be  the  breakpoint  B2  stuck-closed.  The  single  fault /is  compatible  with  L, 
as  is  the  single  fault  g.  However,  the  multiple  fault  {/  g}  shorts  the  signals  Sy  and  S2,  so  it  is  not 
compatible  with  L.  Figure  47  shows  the  opposite  situation,  where  an  incompatible  fault  (open  on 
segment  X)  becomes  compatible  with  the  layout  in  the  presence  of  a  short  between  A  and  B, 
because  the  short  restores  the  connection  cut  by  the  open. 


Figure  47.  Compatible 


Multiple  Fault 


Our  procedures  rely  on  a  network  model  coupled  with  a  layout  database.  The  network  model  is  a 
graph  that  represents  the  topological  relations  between  the  components  of  the  programmable 
interconnect  network  -  wire  segments,  PLB  pins,  FPGA  pins,  and  CIPs.  The  layout  database 
defines  the  CIP  settings  required  to  route  all  the  signal  nets  in  the  circuit.  From  the  layout  data¬ 
base  we  can  retrieve  all  the  segments  and  CIPs  used  by  a  given  signal,  and  determine  which  signal 
is  routed  through  a  given  segment  or  CIP.  In  a  fault-free  network,  a  segment  may  be  driven  by 
only  one  signal. 

To  record  faults  in  the  network  model,  every  CIP  and  segment  has  an  additional  field  that  indi¬ 
cates  its  fault  status:  a  CIP  can  be  fault  free,  stuck-on,  or  stuck-off,  and  a  segment  can  be  fault 
free,  open,  stuck,  or  shorted.  For  a  shorted  segment,  we  also  record  the  identity  of  its  partner  -  the 
other  segment  involved  in  the  short.  As  illustrated  in  Figure  48,  the  presence  of  crosspoint  CIPs 
on  a  shorted  segment  complicates  the  reuse  problem,  because  the  actual  short  may  involve  any 
one  of  the  nine  pairs  of  portions  of  the  two  wire  segments  (AX),  (A,Y), (C,Y),  (C,Z).  While  this 


56 


Figure  48.  Where  Is  the  Short? 


resolution  is  not  needed  to  model  the  short,  it  may  lead  to  a  problem  if  an  open  also  oecurs  on  one 
of  the  original  segments:  for  example,  if  B  becomes  open,  the  topology  of  the  network  is  different 
depending  on  which  side  of  the  open  the  short  resides.  In  such  a  case  we  use  a  separate  diagnosis 
procedure  [54]  to  determine  the  exact  topology  caused  by  the  multiple  fault. 

We  determine  the  faulty  resources  that  may  be  reused  (and  the  ones  that  need  to  be  bypassed) 
indirectly,  by  finding  the  signal  nets  that  have  to  be  rerouted.  This  is  more  efficient,  since  our 
incremental  router  works  based  on  signal  nets,  and  the  same  net  may  be  affected  by  more  than  one 
fault.  We  say  that  a  signal  net  S  (part  of  layout  L)  is  compatible  with  a  group  of  faults  if  and  only 
if  S  can  connect  its  source  to  all  of  its  sinks  in  the  presence  of  the  faults,  and  no  other  signal  drives 
the  segments  of  S.  A  signal  has  to  be  rerouted  only  if  it  is  incompatible  with  the  current  group  of 
faults. 

The  procedure  to  find  the  incompatible  nets  works  as  follows.  We  first  insert  all  the  located  faults 
in  the  network  model.  Next,  we  process  the  faults  to  find  all  the  signal  nets  affected  by  faults. 
Then  we  trace  every  affected  signal  S  from  its  source  to  all  its  sinks,  keeping  track  of  the  segments 
used.  Tracing  stops  at  open  segments,  at  stuck-open  CIPs,  and  at  stuck  segments,  but  it  does  go 
through  stuck-on  CIPs,  and  it  jumps  from  a  shorted  segment  to  its  partner.  If  faults  prevent  S  to 
reach  at  least  one  of  its  destinations,  then  S  is  incompatible.  We  record  all  the  segments  reached 
by  every  traced  net.  After  all  signals  are  traced,  we  can  identify  the  segments  driven  by  more  than 
one  signal;  the  signals  that  drive  these  segments  are  shorted,  and  only  one  of  them  may  be  allowed 
to  continue  to  drive  the  shorted  segments,  while  the  other  ones  will  be  marked  as  incompatible.  To 
determine  which  signal  to  leave  in  place,  first  we  exclude  the  ones  already  found  to  be  incompati¬ 
ble  because  the  trace  has  not  reached  some  of  their  destinations.  To  select  among  the  remaining 
signals,  we  analyze  the  timing  margins  (slacks)  of  their  paths,  and  chose  the  one  with  the  least 
slack.  In  this  way,  we  minimize  the  impact  of  rerouting  on  the  system  timing,  since  the  signals  to 
be  rerouted  have  more  slack.  We  break  ties  by  selecting  the  net  most  difficult  to  reroute  as  the  one 
left  in  place.  Rerouting  difficulty  for  a  net  can  be  approximated  by  its  length,  since  shorter  nets  are 
likely  to  be  easier  to  reroute.  Other  criterion  could  be  based  on  layout  density. 

For  example,  in  Figure  46,  tracing  S ;  will  go  through  J,  B,  C,  and  continue  along  S2,  while  tracing 
S2  will  go  through  C,B,  A,  and  continue  along  Sj.  Since  the  traced  segments  are  recorded  as  being 
driven  by  both  Sj  and  S2,  these  signals  are  shorted.  Assuming  that  neither  S';  nor  S2  have  other 
faults,  we  select  one  of  them  to  be  left  unchanged  (based  on  the  analysis  described  above),  while 
the  other  will  be  allowed  to  continue  to  drive  the  shorted  segments.  This  is  possible  since  after 
rerouting,  the  multiple  fault  will  become  compatible  with  the  new  layout. 

Tracing  the  signal  in  Figure  47  will  not  go  through  the  open  segment  X,  but  it  will  reach  all  its 
destinations  following  the  connection  between  A  and  B  created  by  the  short.  Hence  the  signal  is 
compatible  with  the  multiple  fault,  and  it  does  not  have  to  be  rerouted. 
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In  Figure  49,  assume  that  the  source  of  i";  is  on  the  left.  Then  tracing  Sj  stops  at  the  open  segment 
X,  and  tracing  S2  reaches  B  via  the  short  between  B  and  C.  Because  of  the  open,  B  is  driven  by 
only  one  signal  {S2),  so  that  there  is  no  other  signal  shorted  to  S2,  which  is  hence  compatible  with 
the  multiple  fault,  and  does  not  have  to  be  rerouted. 


Figure  49.  Signal  Compatibility 

The  fault  status  of  CIPs  and  segments  is  recorded  in  the  network  model.  An  ideal  router  would 
correctly  use  this  information,  so  that  it  will  never  route  a  net  through  a  stuck-open  CIP,  an  open 
segment,  or  a  stuck-at  segment.  However,  our  current  implementation  relies  on  a  commercial 
router  that  does  not  provide  such  features.  Our  workaround  is  to  mark  the  incompatible  faults  by 
creating  dummy  nets  that  use  the  corresponding  faulty  resources.  These  dummy  nets  have  no 
source  or  sink  and  are  created  only  to  tie  up  unusable  CIPs  and  segments.  They  are  never  ripped 
up,  and  thus  they  will  not  be  used  to  route  any  real  nets.  However,  this  solution  is  less  flexible  than 
the  ideal  one. 

Note  that  routing  done  in  a  faulty  interconnect  network  can  take  advantage  of  faults  to  route  signal 
nets  in  ways  that  would  have  been  impossible  in  the  fault-free  network.  For  example.  Figure  50 
shows  an  open  that  allows  the  use  of  crosspoints  Xj  andX2  by  two  different  signals,  Sj  and  S2;  in 
the  fault-free  network,  they  may  be  used  only  by  the  same  signal. 


"1  M  r 

L _ ^ 

V  ^ 

J - 7T - ^ 

j 

— ^ — 

Figure  50.  Taking  Advantage  of  an  Open  Segment 

The  described  solution  is  an  ideal  one  since  it  assumes  that  the  router  can  read  the  layout  database 
in  which  we  maintain  the  fault  status  of  CIPs  and  segments.  In  our  implementation,  we  use  a  com¬ 
mercial  router  that  requires  us  to  reserve  a  CIP  and  two  wire  segments  to  prevent  the  incorrect  use 
of  faulty  resources.  However,  we  can  reuse  shorts.  When  an  incompatible  short  occurs  (two  or 
more  signals  using  the  same  net  segment  group),  we  will  compare  the  signals  against  the  critical 
path.  If  one  of  the  signals  is  in  the  critical  path,  we  will  keep  it  intact  and  ripup  the  other  signals. 
If  neither  signal  is  in  the  critical  path,  we  will  pick  the  one  to  ripup  based  on  net  length,  since 
shorter  signals  should  be  easier  to  reroute.  We  cannot  make  use  of  opens  with  the  commercial 
router;  we  can  only  use  one  portion  of  the  open  segments.  The  ability  to  use  multiple  open  seg¬ 
ment  in  what  was  once  a  continuous  wire  segment  would  require  a  specialized  router.  Our  imple¬ 
mented  solution  relies  on  marking  the  incompatible  faults  by  creating  dummy  nets  that  use  the 
corresponding  faulty  resources.  These  dummy  nets  have  no  source  or  sink  and  are  just  used  to  tie 
up  unusable  CIPs  and  segments.  They  are  never  ripped  up,  and  thus  they  will  not  be  used  to  route 
any  real  nets.  However,  this  solution  is  less  flexible  than  the  ideal  one. 

As  a  side  note,  we  discovered  a  shortcoming  (relative  to  fault  tolerance)  of  the  critical  path  analy¬ 
sis  tool  (tree)  used  in  our  implementation.  This  is  worth  mentioning  because  other  such  static  tim- 
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ing  analysis  tools  also  suffer  from  the  same  limitations.  The  speeific  problem  relates  to  shorts 
(either  shorted  segments  or  stuck-closed  CIPs).  When  a  short  occurs,  there  may  be  additional 
capacitance  added  to  segments  used  to  route  signal  nets.  In  this  case,  tree  will  return  a  maximum 
clock  speed  or  a  minimum  propagation  delay  that  is  inaccurate  and  may  cause  the  clock  to  be  set 
at  an  inappropriate  speed.  This  is  a  problem  only  if  the  additional  delay,  not  visible  to  tree,  creates 
a  path  delay  longer  than  the  clock  period.  To  accurately  set  the  clock  speed,  the  static  timing  anal¬ 
ysis  tool  must  work  with  the  updated  model  illustrated  in  Figure  45. 


5.5.2  Bypassing  Incompatible  Interconnect  Faults 

Routing  and  incremental  routing  are  NP-hard  problems.  For  dense  circuits,  finding  one  solution 
where  the  circuit  is  completely  routed  may  be  impossible.  If  a  solution  is  found,  there  may  not  be 
any  alternate  solutions.  Additionally,  routing  run-times  may  be  exceedingly  long.  Thus  in  our 
approach  to  interconnect  fault  tolerance,  we  try  to  maintain  the  original  circuit  routing  as  much  as 
possible.  This  improves  the  possibility  of  obtaining  a  100  percent  routed  circuit,  and  reduces  the 
amount  of  time  required  to  incrementally  route.  Given  an  interconnect  fault / that  is  not  compati¬ 
ble  with  the  layout,  we  use  a  multistage  fault-bypassing  technique;  if  we  have  a  set  of  faults,  we 
handle  them  the  same  way  we  handle  a  single  fault.  We  must  perform  our  routing  fault  tolerance 
for  each  of  the  STAR  positions.  Each  step  in  our  process  is  described  below. 


Stage  1:  In  this  stage  we  perform  local  logic  reconfiguration  to  bypass  interconnect  faults  that 
will  not  allow  some  signal  nets  to  connect  to  specific  inputs  or  outputs  on  a  PLB.  If  an  input  to  a 
PLB  is  blocked,  we  try  to  move  the  function  of  the  blocked  resource  to  an  unused  resource  in  the 
same  PLB.  For  example.  Figure  51  shows  an  example  of  a  stuck-open  CIP  denying  access  to  an 
input  pin  on  LUT  B.  If  LUT  A  is  not  being  used,  there  is  a  possibility  we  can  move  the  function 
from  LUT  B  to  LUT  A;  keeping  the  LUT  function  within  the  same  PLB  is  less  likely  to  introduce 
delays.  For  this,  we  reconfigure  the  PLB  logic  and  we  partially  (or  fully)  ripup  the  signal  nets 
going  to  the  inputs  and  outputs  of  LUT  A  [33].  We  add  the  signals  to  the  set  S  of  unrouted  nets  for 
later  incremental  rerouting.  If  such  a  transformation  is  not  possible  within  the  blocked  PLB,  we 
partially  (or  fully)  ripup  all  the  signal  nets  attached  to  the  PLB,  move  the  logic  cell  function  of 
the  PLB  to  a  new  compatible  spare  location  (using  mini-max  grid  matching),  and  add  the  ripped 
up  signal  nets  to  S. 


a)  initial  route  to  LUT  B  b)  function  moved  to  LUT  A 

Figure  51.  Example  Stuck-Open  CIP 


Stage  2:  In  this  stage  we  perform  partial  signal  net  ripup.  We  partially  ripup  any  signal  net  5  that 
is  incompatible  with / and  add  it  to  our  set  S  of  unrouted  signal  nets.  If  /is  in  an  uncongested  glo¬ 
bal  area  or  a  bypassable  local  area,  it  may  be  possible  to  ripup  the  signal  net  without  disturbing 
other  routed  signal  nets.  For  example.  Figure  52  shows  a  signal  net  that  was  originally  routed 
through  a  segment  between  crosspoint  CIPs  Xj  and  X2  that  is  now  open.  If  X5  and  X^  aren’t  being 
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used  in  other  signal  nets,  the  part  of  the  signal  route  between  X;  andX2  ean  be  ripped  up  and 
rerouted  through  Xj,  Xj,  X^,  and  X^. 


Figure  52.  Bypassing  an  Open  Segment 


Stage  3:  In  this  stage,  we  attempt  to  incrementally  reroute  each  s  e  S  without  changing  the  route 
of  other  nets  not  affected  by  faults.  Initially  S  contains  all  nets  affected  by  faults.  Note  that  we 
mark  any / so  the  router  won’t  use /in  any  subsequent  route.  If  we  are  successful,  we  can  stop  our 
interconnect  FT  reconfiguration  at  this  stage.  Otherwise,  we  move  on  to  subsequent  stages. 

Stage  4:  After  Stages  1  through  3,  if  we  have  unrouted  nets  (S  is  not  empty),  we  enter  Stage  4. 
This  is  likely  to  occur  in  areas  of  congested  routing,  where  most  routing  resources  have  been 
already  used  or/and  are  faulty.  Here  we  use  a  more  global  transformation,  ripping  up  all  the  signal 
nets  within  by  a  variable-sized  window  in  the  area  around  the  unrouted  signal  nets.  If  we  are  not 
successful,  we  increment  the  window  size  and  repeat  the  process  until  we  reach  the  boundaries  of 
the  FPGA.  If  we  are  successful,  we  stop.  If  at  the  end  we  still  have  unroutes,  we  proceed  to  Stage 
5. 

Stage  5:  At  this  stage  we  have  been  unable  to  achieve  a  100  percent  routed  circuit.  Thus  to 
achieve  a  routed  circuit,  we  require  more  drastic  measures.  In  this  stage,  we  verify  that  all  faulty 
logic  and  interconnect  resources  are  properly  marked  as  either  partially  or  completely  unusable. 
Then  within  areas  reserved  for  STARs,  we  perform  system  function  place  and  route  with  no  preal¬ 
located  spare  resources  in  place.  In  the  event  that  this  is  successful,  we  can  incrementally  reinsert 
spare  resources  for  future  use.  Note  that  the  commercial  place  and  route  tools  we  use  for  imple¬ 
mentation  do  not  have  the  ability  to  make  use  of  partially  usable  logic  or  interconnect  resources. 
For  this  we  require  special  tools  with  modifications  to  check  for  compatibility. 

If  Stage  5  is  not  successful,  we  can  steal  resources  from  a  STAR  to  increase  available  resources 
for  placing  and  routing  the  system  function.  With  one  STAR  still  available  for  testing,  we  retain 
the  ability  to  test  all  logic  and  some  of  the  interconnect  resources. 

5.5.3  Preallocating  Spare  Resources 

Our  approach  does  not  require  that  we  preallocate  spare  interconnect  resources,  since  typically  an 
FPGA  will  have  many  unused  interconnect  resources.  Thus  we  have  implemented  an  optional 
spare  interconnect  preallocation  strategy  to  evenly  distribute  the  spare  interconnects  across  the 
FPGA.  This  will  increase  the  likelihood  of  having  a  spare  in  the  vicinity  of  the  fault,  which  in  turn 
will  reduce  fault  avoidance  reconfiguration  time,  will  increase  the  probability  of  a  successful 
incremental  routing,  and  minimize  the  impact  of  reconfiguration  on  nonfaulty  signal  nets.  Preallo¬ 
cating  spare  interconnect  resources  also  has  disadvantages.  There  is  an  inherent  increase  in  the 
initial  circuit  place  and  route  time,  and  for  dense  circuits  performance  can  also  be  degraded. 
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Our  spare  interconnect  preallocation  strategy  is  based  on  routing  a  set  of  dummy  signals  that  are 
evenly  distributed  across  the  FPGA.  To  accomplish  this,  we  reserve  programmable  routing 
resources  by  using  them  to  route  dummy  signal  nets  that  have  no  sources  or  sinks.  Below  we  list 
the  sequence  for  inserting  spare  interconnect  resources,  and  in  the  rest  of  this  section  we  describe 
the  steps. 

1)  Define  a  set  of  dummy  signals. 

2)  Route  the  set  of  dummy  signals. 

3)  Place  and  route  the  system  application  function. 

4)  Remove  the  dummy  signals. 

In  step  1,  we  create  a  parameterized  set  of  dummy  signal  nets.  The  parameters  controlling  the 
type,  number,  and  direction  of  dummy  signal  nets  form  a  triple  (t,  w,  d).  The  variable  t  controls  the 
type  of  interconnect  used  for  the  dummy  signal  net.  Types  range  from  segments  that  are  one  PLB 
in  length  to  segments  that  span  the  entire  FPGA.  The  variable  w  defines  the  number  of  spares  of 
type  t  to  be  reserved  in  each  channel  of  the  FPGA.  The  variable  d  defines  which  channels  are  used 
for  the  dummy  signal  nets.  Valid  values  of  d  include  H  (horizontal),  V  (vertical),  B  (both  horizon¬ 
tal  and  vertical). 

In  step  2,  we  route  the  dummy  signals  defined  in  step  1 .  We  accomplish  this  step  using  a  special 
tool  that  reserves  specific,  user-defined  interconnect  resources  (so  they  will  not  be  used  by  the 
general  router  to  route  the  circuit)  for  spares. 

If  the  system  function  will  not  route  in  step  3,  we  go  back  and  reduce  the  number  of  interconnect 
resources  reserved  in  step  1  by  reducing  w  or/and  d.  Then  in  step  4  we  remove  the  dummy  signals 
(and  any  dummy  logic  reserved  for  logic  fault  tolerance),  leaving  the  interconnect  (and  logic) 
spares  available  for  fault  bypassing. 

5.6  Fault  Tolerance  Summary 

In  this  section  we  have  described  our  FT  approach  for  both  logic  and  interconnect.  In  our 
approach,  we  have  introduced  several  new  concepts  for  both  logic  and  interconnect  fault  toler¬ 
ance.  Since  determination  of  all  FT  configurations  (if  they  are  required)  takes  place  with  one  or 
both  STARS  parked  over  the  faults,  the  system  function  is  free  to  continue  executing,  unaffected 
by  any  faults.  Thus,  when  used  with  the  roving  STARs  approach  for  test  and  diagnosis  of  faults  on 
FPGAs,  the  approach  is  on-line. 

Whenever  possible  we  make  use  of  faulty  programmable  resources.  For  logic  we  use  the  fault  if  it 
is  compatible  with  the  function  programmed  in  the  faulty  resource.  If  the  fault  is  not  compatible 
with  the  current  logic,  we  attempt  to  find  logic  that  is  compatible  with  the  fault.  If  that  fails,  we 
use  the  faulty  PLB  as  a  PUB,  and  make  use  of  the  nonfaulty  portions  of  the  PLB.  This  technique 
is  a  big  improvement  over  previous  techniques  that  bypassed  the  faulty  PLB.  We  have  shown  that 
we  can  actually  continue  to  execute  even  when  the  number  of  faulty  PLBs  exceeds  the  number  of 
spare  PLB  resources. 
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When  we  do  reeonfigure  for  logie  fault  toleranee,  we  make  use  of  optional,  preallocated  local 
spare  resources  whenever  possible.  Our  spare  resource  allocation  strategies  guarantee  local  (adja¬ 
cent)  spares  if  the  logic  resource  usage  is  less  than  80  percent.  If  local  spares  are  not  available,  or 
if  a  group  of  PLBs  become  faulty,  we  use  a  mini-max  grid  matching  strategy  to  match  faulty  logic 
cell  locations  to  compatible  spare  resources  such  that  the  amount  of  reconfiguration  is  minimized. 
We  have  introduced  the  idea  of  a  programmable  clock  for  fault  tolerance.  This  allows  the  clock  to 
initially  be  set  to  its  fastest  possible  rate,  and  the  clock  is  only  reduced  as  necessary  when  faults 
affect  the  circuit’s  critical  paths. 

When  using  faulty  interconnect  we  consider  not  only  single  faults  but  also  multiple  faults.  For 
sparse  circuits,  most  of  the  interconnect  faults  that  can  occur  are  compatible  with  the  circuit  lay¬ 
out.  If  the  interconnect  faults  are  compatible  with  the  current  configuration  of  the  circuit  layout 
we  do  not  reconfigure.  This  is  one  way  to  increase  the  fault  tolerance  of  the  circuit  and  extend  the 
lifetime  of  the  circuit.  When  reconfiguring  to  avoid  interconnect  faults,  we  use  a  five-step  process 
to  determine  FABRICS.  We  also  presented  an  optional  interconnect  spare  allocation  strategy  to 
evenly  distribute  spare  interconnect  resources  across  the  FPGA. 
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6.  Summary  and  Conclusions 


We  have  developed  an  integrated  approach  for  on-line  FPGA  testing,  diagnosis,  and  fault  toler¬ 
ance,  applicable  to  any  FPGA  supporting  incremental  RTR.  Its  central  concept  of  roving  STARs 
sets  aside  two  sections  of  the  FPGA  in  which  self-testing  goes  on  without  disturbing  the  normal 
system  activity  in  the  rest  of  the  chip.  The  roving  of  the  STARs  periodically  brings  every  section 
of  the  device  under  test.  Our  approach  guarantees  complete  testing  of  both  PLBs  and  intercon¬ 
nect,  and  does  not  require  any  part  of  the  chip  (including  the  configuration  memory)  to  be  fault 
free.  A  STAR  consists  of  several  independent  BISTERs,  which  concurrently  test  disjoint  tiles  of 
PLBs  and  interconnect  resources  within  the  STAR.  The  logic  BISTER  is  repeatedly  configured 
using  a  rotating  strategy  so  that  every  PLB  in  its  tile  is  completely  tested  in  every  mode  of  opera¬ 
tion,  and  practically  any  combination  of  faulty  PLBs  is  guaranteed  to  be  detected.  The  intercon¬ 
nect  BISTER  is  repeatedly  configured  to  completely  test  all  interconnect  resources  (both  global 
and  local)  within  the  STAR  for  shorts  and  opens  as  well  as  CIP  faults  and  faults  affecting  the  con¬ 
figuration  memory  bits  that  control  the  CIPs.  All  test-related  activities,  including  STAR  reconfig¬ 
urations,  are  done  without  disturbing  the  normal  system  operation. 

Our  adaptive  diagnosis  technique  guarantees  maximal  diagnostic  resolution  in  locating  defective 
PLBs  and  interconnect  faults.  In  addition  to  reporting  a  faulty  PLB,  we  can  also  identify  its  failing 
internal  module  or  its  failing  mode  of  operation.  This  level  of  diagnostic  accuracy  provides  the 
basis  of  a  new  form  of  fault  tolerance,  where  a  defective  PLB  is  allowed  to  be  used  in  the  system 
logic,  provided  that  its  intended  operation  is  not  affected  by  the  identified  faults.  Similarly  we  can 
diagnosis  interconnect  faults  to  a  level  of  resolution  that  facilitates  the  reuse  of  faulty  interconnect 
resources.  This  reuse  of  fault  logic  and  routing  resources  results  in  a  more  graceful  degradation 
and  longer  mission  life. 

We  have  developed  a  dynamic  FT  method,  where  both  spare  interconnect  and  spare  PLB 
resources  needed  to  bypass  a  fault  are  allocated  only  after  the  fault  has  been  detected  and  diag¬ 
nosed,  and  the  spare  resources  are  always  present  in  the  neighborhood  of  the  located  fault.  Thus 
fault  reconfiguration  paths  may  be  much  shorter  than  in  other  methods.  Since  the  detected  faults 
are  always  in  a  spare  area,  the  normal  system  operation  need  not  be  interrupted  to  diagnose  and 
bypass  the  fault. 

We  have  implemented  this  approach  using  commercially  available  FPGAs  -  the  ORCA  series  2C 
and  2CA.  We  have  successfully  used  our  integrated  approach  for  both  fault- free  and  faulty  FPGAs 
that  included  logic  and  interconnect  faults.  To  further  evaluate  our  testing,  diagnosis,  and  FT  tech¬ 
niques,  we  have  developed  a  fault  injection  emulation  capability  that  allows  us  to  inject  any  num¬ 
ber  of  logic  and  routing  faults  into  any  region  of  the  FPGA  before  and/or  during  operation  of  the 
FPGA  with  the  system  function  and  roving  STARs.  Our  roving  STARs  approach  has  been  applied 
to  seven  different  benchmark  system  functions,  including  two  from  the  set  of  ACS  benchmark  cir¬ 
cuits.  These  results  have  demonstrated  the  feasibility  of  our  approach. 


Summary  and  Conclusions 
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6.  Brandon  Skaggs  was  responsible  for  the  development  of  the  logic  BIST  and  PUB  diagnosis. 
Rejoined  the  project  in  March  1999  and  received  BSEE  and  MSEE  degrees  in  May  1999  and 
August  2000,  respectively. 

7.  Matthew  Carter  was  responsible  for  the  initial  development  of  interconnect  BIST.  Rejoined 
the  project  in  May  1999  and  received  a  BSEE  degree  in  May  2000. 

8.  Matthew  Lashinsky  was  responsible  for  the  development  of  the  interconnect  BIST.  Rejoined 
the  project  in  January  2000  and  received  BSEE  and  MSEE  degrees  in  May  2000  and  December 
2001,  respectively.  Re  was  a  finalist  for  the  UNC-Charlotte  Cameron  Applied  Research  Award 
in  2001  for  his  work  on  this  project. 

9.  Jeremy  Nall  was  responsible  for  the  development  of  interconnect  diagnostics.  Re  joined  the 
project  in  February  2000  and  received  a  BSEE  degree  in  December  or  2001.  Re  will  graduate 
with  an  MSEE  in  May  2002.  Re  was  a  finalist  for  the  UNC-Charlotte  Cameron  Applied 
Research  Award  in  2002  for  his  work  on  this  project. 

10.  James  Roller  was  responsible  for  initial  development  of  interconnect  BIST.  Rejoined  the 
project  in  January  2001  and  received  a  BSEE  degree  in  May  2001. 

1 1.  Andrew  Taylor  was  responsible  for  interconnect  fault  tolerance.  Rejoined  the  project  in  Feb¬ 
ruary  2000  and  received  a  BSEE  degree  in  December  or  2001.  Re  expects  to  graduate  with  an 
MSEE  in  August  2002. 

12. Stanley  Baumgart  was  responsible  for  logic  fault  tolerance.  Rejoined  the  project  in  April 
2000  and  received  an  MSEE  degree  in  August  2001. 
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IS.Pankaj  Kataria  was  responsible  for  intereonnect  fault  toleranee.  He  joined  the  project  in  Sep¬ 
tember  1999  and  received  an  MSEE  degree  in  May  2001. 

14.  Jason  Cheatham  was  responsible  for  the  TREC.  He  joined  the  project  in  September  1999  and 
received  BSEE  and  BSCS  degrees  in  December  2001  and  May  2002,  respectively. 

15.  Michelle  Cheatham  was  responsible  for  the  GUI.  She  joined  the  project  in  May  2001  and 
received  BSCS  and  MBA  degrees  in  December  2001  and  May  2002,  respectively. 

16.  Thomas  Slaughter  was  responsible  for  the  fault  injection  emulator.  He  joined  the  project  in 
May  2001  and  expects  to  graduate  with  a  BSEE  degree  in  May  2003. 

IV.Vinay  Verma  worked  on  the  initial  design  of  the  BISTERs  during  the  summer  of  1998.  He  is  a 
Ph.D.  student  at  the  University  of  Illinois,  Chicago. 

18.  Mark  Boyd  worked  on  the  initial  implementation  of  the  roving  process  during  the  summer 
1999.  He  is  a  Ph.D.  student  at  the  University  of  California,  Santa  Cruz 

19. Xiaomlng  Yu  worked  on  the  problem  of  mixed-mode  diagnosis  during  the  summer  2001.  He 
is  a  Ph.D.  student  at  the  University  of  Illinois,  Champaign-Urbana. 

20.Shantauu  Dutt  (Assistant  Professor,  University  of  Illinois,  Chicago)  participated  in  the  initial 
discussions  on  the  fault-tolerance  features  of  the  project. 

The  project  has  resulted  in  a  number  of  patents  as  well  as  papers  and  presentations  at  conferences 

and  workshops,  along  with  invited  presentations  as  summarized  below: 

Patents: 

1.  M.  Abramovici  and  C.  Stroud,  “Fault  Tolerant  Operation  of  FPGAs,”  U.S.  Patent  #  6,256,758 
issued  7/3/2001. 

2.  M.  Abramovici  and  C.  Stroud,  “On-line  Testing  of  the  Programmable  Logic  Blocks  in 
FPGAs,”  pending,  application  no.  09/405,958,  filed  9/27/1999,  patent  has  been  allowed. 

3.  M.  Abramovici  and  C.  Stroud,  “On-line  Testing  of  the  Programmable  Interconnect  Network  in 
FPGAs,”  pending,  application  no.  09/406,219,  filed  9/27/1999,  patent  has  been  allowed. 

4.  M.  Abramovici  and  C.  Stroud,  “On-line  Diagnosis  of  the  Programmable  Logic  Blocks  in 
FPGAs,”  pending,  application  no.  09/671,853,  filed  9/27/1999. 

5.  M.  Abramovici,  J.  Emmert,  and  C.  Stroud,  “On-line  Fault  Tolerant  Operation  via  Incremental 
Reconfiguration  of  FPGAs,”  pending,  application  no.  09/61 1,449,  filed  7/6/2000. 

6.  M.  Abramovici  and  C.  Stroud,  “Diagnosis  of  the  Programmable  Interconnect  Network  in 
FPGAs,”  pending,  application  no.  09/994,299,  filed  11/26/2001. 

7.  M.  Abramovici,  J.  Emmert,  and  C.  Stroud,  “Interconnect  Fault  Tolerance  in  FPGAs,”  pending, 
application  no.  60/305,534,  filed  7/6/2001. 

8.  M.  Abramovici  and  C.  Stroud,  “BIST-Based  Delay  Fault  Testing  in  FPGAs”,  patent  application 
in  preparation,  to  be  filed  7/8/2002. 
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Journal  Papers: 

1 .  M.  Abramovici  and  C.  Stroud,  BIST-Based  Test  and  Diagnosis  of  FPGA  Logic  Blocks”,  IEEE 
Trans,  on  VLSI  Systems,  Vol.  9,  No.  1,  pp.  159-172,  2001. 

2.  J.  Emmert,  J.  Cheatham,  P.  Kataria,  C.  Stroud,  and  M.  Abramovici,  “Predicting  Performance 
Penalty  for  Fault  Tolerance  in  Roving  Self-Testing  AReas  (STARs),”  Lecture  Notes  in  Com¬ 
puter  Science,  Vol.  1896,  pp.  545-554,  2000. 

Conference  Papers  &  Presentations: 

1.  M.  Abramovici,  C.  Stroud,  S.  Wijesuriya,  C.  Hamilton,  and  V.  Verma,  “Using  Roving  STARs 
for  On-Line  Testing  and  Diagnosis  of  FPGAs  in  Fault-Tolerant  Applications,”  Proc.  Interna¬ 
tional  Test  Conf,  pp.  973-982,  1999. 

2.  M.  Abramovici  and  C.  Stroud,  “BIST-Based  Delay  Fault  Testing  in  FPGAs,”  to  be  published 
Proc.  IEEE  International  On-Line  Testing  Workshop,  2002. 

3.  C.  Stroud,  J.  Nall,  M.  Lashinsky,  and  M.  Abramovici,  “BIST-Based  Diagnosis  of  FPGA  Inter¬ 
connect,”  to  be  published  Proc.  IEEE  International  Test  Conf.,  2002. 

4.  M.  Abramovici,  J.  Emmert,  and  C.  Stroud,  “Roving  STARs:  An  Integrated  Approach  to  On- 
Line  Testing,  Diagnosis,  and  Fault  Tolerance  for  FPGAs  in  Adaptive  Computing  Systems,” 
Proc.  NASA/DoD  Workshop  on  Evolvable  Hardware,  pp.  73-92,  2001. 

5.  M.  Abramovici,  C.  Stroud,  and  J.  Emmert,  “Using  Embedded  FPGAs  for  SoC  Yield  Improve¬ 
ment,”  Proc.  AMC/IEEE  Design  Automation  Conf,  pp.  713-724,  2002. 

6.  M.  Abramovici  and  C.  Stroud,  “BIST-Based  Detection  and  Diagnosis  of  Multiple  Faults  in 
FPGAs,”  Proc.  IEEE  International  Test  Conf,  pp.  785-794,  2000. 

7.  M.  Abramovici,  C.  Stroud,  B.  Skaggs,  and  J.  Emmert,  “Improving  BIST-Based  Diagnosis  for 
Roving  STARs,”  Proc.  IEEE  International  On-Line  Testing  Workshop,  pp.  31-39,  2000. 

8.  C.  Hamilton,  G.  Gibson,  S.  Wijesuriya,  and  C.  Stroud,  “Enhanced  BIST-Based  Diagnosis  of 
FPGAs  via  Boundary  Scan  Access,”  Proc.  IEEE  VLSI  Test  Symp.,  pp.  4I3-4I8,  1999. 

9.  C.  Stroud,  M.  Lashinsky,  J.  Nall,  J.  Emmert,  and  M.  Abramovici,  “On-Line  BIST  and  Diagno¬ 
sis  of  FPGA  Interconnect  Using  Roving  STARs,”  Proc.  IEEE  International  On-Line  Testing 
Workshop,  pp.  31-39,  2001. 

10. J.  Emmert,  S.  Baumgart,  P.  Kataria,  A.  Taylor,  C.  Stroud,  and  M.  Abramovici,  “On-Line  Fault 
Tolerance  for  FPGA  Interconnect  with  Roving  STARs,”  Proc.  Defect  and  Fault  Tolerance  in 
VLSI  Systems  Conf,  pp.  445-454,  2001. 

I  I.T.  Slaughter,  C.  Stroud,  J.  Emmert,  and  B.  Skaggs,  “Fault  Injection  Emulation  for  Field  Pro¬ 
grammable  Gate  Arrays,”  Proc.  International  Society  for  Optical  Engineering  ITCOM,  pp.  1-9, 
2001. 

12.J.  Emmert,  C.  Stroud,  B.  Skaggs,  and  M.  Abramovici,  “Dynamic  Fault  Tolerance  in  FPGAs 
via  Partial  Reconfiguration,”  Proc.  Field-Programmable  Custom  Computing  Machines  Conf, 
pp.  165-174,  2000. 
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13J.  Emmert  and  J.  Cheatham,  “On-Line  Incremental  Routing  for  Interconnect  Fault  Tolerance  in 
FPGAs  Minus  the  Router,”  Proc.  Defect  and  Fault  Tolerance  in  VLSI  Systems  Conf,  pp.  149- 
157,  2001. 

MSEE  Theses: 

1 .  Sajitha  Wijesuriya,  “Built-In  Self-Test  for  Programmable  Interconnect  in  Field  Programmable 
Gate  Arrays,”  MSEE  thesis.  Dept,  of  Electrical  Engineering,  University  of  Kentucky,  May 
1999. 

2.  Carter  Hamilton,  “On-Line  Testing  and  Reconfiguration  of  Field  Programmable  Gate  Arrays,” 
MSEE  thesis.  Dept,  of  Electrical  Engineering,  University  of  Kentucky,  Aug.  2000. 

3.  Brandon  Skaggs,  “On-Line  Built-In  Self-Test  and  Diagnosis  of  Field  Programmable  Gate 
Array  Logic  Blocks,”  MSEE  thesis,  Dept,  of  Electrical  Engineering,  University  of  Kentucky, 
Aug.  2000. 

4.  Pankaj  Kataria,  “Incremental  Routing  Strategy  Based  on  Window  Rip-Up,”  MSEE  thesis. 
Dept,  of  Electrical  &  Computer  Engineering,  University  of  North  Carolina  at  Charlotte,  May 
2001. 

5.  Stanley  Baumgart,  “On-Line  Reconfiguration  of  FPGAs  for  Fault  Tolerant  Applications  in 
Adaptive  Computing  Systems,”  MSEE  thesis,  Dept,  of  Electrical  &  Computer  Engineering, 
University  of  North  Carolina  at  Charlotte,  Aug.  2001. 

6.  Matthew  Lashinsky,  “On-Line  Built-In  Self-Test  of  Field  Programmable  Gate  Array  Intercon¬ 
nect,”  MSEE  thesis,  Dept,  of  Electrical  &  Computer  Engineering,  University  of  North  Carolina 
at  Charlotte,  December  2001. 

7.  Jeremy  Nall,  “On-Line  Diagnosis  of  Field  Programmable  Gate  Array  Interconnect,”  MSEE 
thesis  in  preparation,  Dept,  of  Electrical  &  Computer  Engineering,  University  of  North  Caro¬ 
lina  at  Charlotte,  expected  graduation  May  2002. 

8.  Andrew  Taylor,  “Faulty  Interconnect  Reuse  and  Incremental  Routing  for  Enhanced  FPGA 
Interconnect  Fault  Tolerance,”  MSEE  thesis  in  preparation,  Dept,  of  Electrical  &  Computer 
Engineering,  University  of  North  Carolina  at  Charlotte,  expected  graduation  August  2002. 

Invited  Presentations: 

1 .  Design  Automation  Conference  (invited  tutorial),  New  Orleans,  LA,  June  2002. 

2.  University  of  Iowa,  Iowa  City,  lA,  April  2002. 

3.  Camegie-Mellon  University,  Pittsburgh,  PA,  May,  2001. 

4.  Institute  for  System  Engineering  Research  (INESC),  Lisbon,  Portugal,  July,  2001. 

5.  University  of  Porto,  Porto,  Portugal,  November,  2001. 

6.  Spotlight  on  Research  TV  Series,  Charlotte,  NC,  September,  2001. 

7.  University  of  Michigan,  Ann  Arbor,  MI,  March  2002. 

8.  Purdue  University,  Lafayette,  IN,  March  2002. 

9.  University  of  Kentucky,  Lexington,  KY,  March,  2002. 
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10.  VLSI  Test  Symposium,  Monterey,  CA,  April,  2002. 

1 1.  Texas  A&M  University,  College  Station,  TX,  April,  2002. 

12. Duke  University,  Raleigh,  NC,  to  be  presented  August  2002. 
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List  of  Acronyms 


ACS 

Adaptive  computing  system 

BIST 

Built-in  self-test 

BUT 

Block  under  test 

CAD 

Computer-aided  design 

CED 

Concurrent  error  detection 

CIP 

Configuration  interconnect  point 

CPLD 

Complex  programmable  logic  devices 

FABRIC 

Fault-bypassing  roving  configuration 

FF 

Flip-flop 

FPGA 

Field-programmable  gate  array 

FT 

Fault- tolerant 

GIU 

Graphical  user  interface 

H-STAR 

Horizontal  self-testing  area 

I/O 

Input/Output 

LUT 

Look-up  table 

MUX 

Multiplexer 

ORA 

Output  response  analyzer 

ORCA 

Optimized  reconfigurable  cell  array 

PLB 

Programmable  logic  block 

PUB 

Partially  usable  block 

RAM 

Random  access  memory 

RTR 

Run-time  reconfiguration 

SRAM 

Static  random  access  memory 

STAR 

Self-testing  area 

TAP 

Test  access  port 

TPG 

Test-pattern  generator 

TREC 

Test  and  reconfiguration  controller 

V-STAR 

Vertical  self-testing  area 

WUT 

Wire  under  test 
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